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and  Technical  Information  held  in  Moscow  in  I967.  The  author 
discusses  the  construction  of  a  terminological  machine  dictionary, 

,  l.e.  Thesauri,  and  the  interrelationship  between  machine  input 
and  retrieval  language,  utilizing  Thesaurus  terms.  Automatic 
indexing  and  reviewing  are  discussed  on  a  general  basis,  with  the 
author  stating  that  the  USSR  has  achieved  favorable  prersqulsits  for 
successful  advancement  in' this  area.  The  USSR  appears  to  have 
many  problems  providing  Information  Processing  Systems  with  machine 
/technology  such  as  the  comparatively  low  class  of  series  domestic 
■  EVTsMj  insufficient  voliomes  of  storage  units,  poorly  developed 
parallelism  of  devices  and  low  reliability,  especially  of  external 
devices,  •  '  . 
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(U)  This  paper  was  presented  at  the  All  Union  Conference  on 
Information  Retrieval  Systems  and  Automatic  Processing  of  Scientific 
and  Technical  Information  held  in  Moscovr  in  I967.  The  author  states 
that  up  to  1965  there  was  no  preparation  of  specialist  to  work  in 
and  further  develop  a  system  of  scientific  and  technical  information. 
In  1965  the  Ministry  of  higher  and  special  secondary  education 
created  in  the  higher  school  system,  speciality  No,  06 4 0- automation 
and  mechanization  of  processes  of  processing  and  delivery  o.f 
information.  Since  the  establishment  of  speciality  No,  0640 
improvement  of  the  curricula  has  been  developed,  such  as,  discip¬ 
lines  of  physicomathematical  and  engineering  cycles  must  not  con¬ 
tain  obsolete  information;  a  mathematics  course  must  reflect  the 
basis  of  mathematical  logic,  the  principles  of  calculus  of  varia¬ 
tion,  probability  and  information  theories,  and  other  divisions 
needed  to  Increase  the  mathematical  level  of  the  specialists  in 
automation;  the  contents  of  disciplines  in 'theoretical  mechanics, 
strength  of  materials,  theory  of  machines  and  mechanisms,  descrip¬ 
tive  geometry  and  drawing,  and  other  general-engineering  disciplines 
not  directly  related  to  specialities  In  automation  should  be  radical] 
examined,  and  the  strengthening  of  general  and  special  education  of 
future  automation  engineers  in  electrical-engineering  disciplines, 
electronics,  and  the  general  theory  of  automatic  control. 
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(U)  Two  Uniterm  retrieval  systems  are  analyzed-  Pusto-Nepusto  2 
and  4.  The  author  states  that  comparison  rules  of  the  Pusto- 
Nepusto-2  system  rests  on  different  assumptions  from  the  Pusto- 
Nepusto-4  system.  These  assumptions  are  formulated  thusly:  if  an 
•  inquiry  descriptor  in  a  document  is  replaced  with  a  lower  descriptor 
this  in  no  way  reflects  on  the  relevance  of  the  document;  if  a 
document  for  a  certain  inquiry  descriptor  has  not  only  an  equal  or 
lower  but  also  a  higher  one,  then  this  in  no  way  effects  docimient 
relevance;  if  in  the  document  for  a  certain  inquiry  descriptor 
there  is  neither  an  equal  nor  a  lower  oie,  but  at  least  one  higher 
: one,,  then  this  lowers  document  relevance,  but  does  not  make  it 
equal  to  zero*  Pusto-Nepusto-2  logic  can  split  delivery  into  2 
;  .echelons  against  the  4  echelons  of , the  Pusto-Nepusto-4  system.  The 
author  concludes  that  the  Pusto-Nepusto-2  system  is  in  the  experi- 
;  mental  exploitation  stage  and  it  is  too  early  to  draw  conclusions 
about  the  results  of  the  conducted  reorganization  of  the  logic  of 
-the  two  systems. 
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.  (U)  ThiB  paper  was  presented  at  the  All  Union  Conference  on 
Information  Retrieval  Systems  and  Automatic  Processing  of  Scientific 
and  Technical  Information  held  in  Moscow  in  196 7.  The  author 
discusses  the  construction  of  automated  information-retrieval 
systems  of  the  descriptor  type  on  the  Minsk  22  and  the  Minsk  2 
digital  computer.  The  paper  examines  an  improved  variant  of  the 
Setka-5  information  processing  system  on  the  Minsk  22  with  use 
of  the  socket  associative-address  method  of  organization  of 
information.  Also  examined  in  the  paper  are  the  creation  of  a  dic¬ 
tionary  of  descriptors  for  the  Thematic  Division- Computer  Technology; 
construction  of  a'inachine  dictionary  of  descriptors  of  an  automated 
IPS;  processing  of  inquiries  and  their  input  into  ETsVM;  recording 
of  initial  information  in  ETsVM;  and  a  description  of  the  algorithm 
of  work  of  the  IPS. 
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(U)  This  paper  was  presented  at  the  All  Union  Conference  on 
Information  Retrieval  Systems  and  Automatic  Processing  of  Scientific 
and  Technical  Information  held  in  Moscow  in  1967*  The  author 
.reports  on  the  development  of  information  retrieval  language  of 

the  descriptor  type  according  to  the  division  of  computer  technology. 
Conclusions  of  the  article  Indicate  that  the  developed  information 
retrieval  language  (IPYa)  has  satisfactory  characteristics;  growth 
of  a  dictionary  is  considerably  delayed  during  the  growth  of  an 
array  of  over  2000  documents;  input  of  grammatical  means  into  IPYa 
is  inexpedient  for  small  arrays;  the  question  regarding  the  need 
to  Introduce  grammatical  means  into  developed  IPYa  will  be 
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,  in  1967;  and  distribution  of  descriptors  in  retrieval  patterns 
of  documents  obeys  the  same,  Zipf  and  Mandel'brot  laws,  as  words 
in  national  language^ texts. 
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PREFACE 


The  Party  and  Government  are  giving  their  constant  attention 
to  the  development  and  improvement  of  systems  of  the  Information 
service.  The  special  resolution  adopted  by  the  Council  of  Ministers 
of  the  USSR  in  November  I966  is  directed  towards  implementing 
solutions  to  the  problerais  enacted  by  the  XXIII  Congress  of  the  CPSU 
on  creation  of  government-wide  highly  effective  scientific-technical 
information  service. 

Being  based  on  contemporary  technical  means  for  acquisition, 
.processing,  investigation  and  delivery  of  information  data,  and 
automation  of  Information  processes,  the  State  system  of  scientific 
information  is  directed  to  provide  timely  information  about 
.achievements  in  domestic  and  foreign  science  and  technology. 

At  present  in  many  organizations  of  the  country  scientific 
research  work  is  being  conducted  in  the  area  of  automation  of 
information  processes.  The  level  and  state  of  this  work  require 
defined  organization  and  coordination.  This  is  especially  important 
since  scientific  and  technical  information  is  such  a  many-faceted 
branch  of  science  and  technology,  embracing  all  branches  of  the 
national  economy,  that  its  further  effective  development  would  be 
Inconceivable  without  coordination  of  the  work  and  exchange  of  the 
experience  of  all  specialists  occupied  with  automation  of  information 
processes. 
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The  Third  All-Union  Conference  on  information  retrieval  systems 
and  automated  processing  of  scientific  and  technical  information, 
conducted  from  I9  through  22  December  I966  in  Moscow,  provided 
results  of  scientific  and  technical  information  activity  of  scientists  . 
and  specialists  and  outlined  the  way  for  the  fullest  and  fastest 
fulfillment  of  problems  set  by  the  Party  and  Government  in  the  area 
of  scientific  and  technical  information. 

The  work  of  conference  involved  the  participation  of  II50 
specialists  representing  different  ministries,  departments,  information 
organs,  scientific  research  and  design  organizations.  Industrial 
enterprises  and  establishments  of  the  country.  At  the  conference 
220  reports  were  presented. 

The  present  Transactions  contain  material  presented  at  the 
plenary  sessions  of  conference,  as  well  as  at  meetings  of  the 
separate  sections. 

For  the  convenience  of  readers,  the  Transactions  are  being 
issued  in  four  volumes. 

The  first  volume  "information  Retrieval  Systems”  presents  the 
results  of  research  and  development  in  the  areas;  semantic  systems 
of  investigation  of  scientific  and  technical  literature  in  large 
data  bases,  with  automatic  translation  from  a  natural  into  a 
formalized  language;  automated  systems  of  factographic  facilities 
based  on  the  use  of  information  languages  of  the  natural  sciences; 
information  retrieval  systems  for  industrial  enterprises  and 
establishments. 

The  same  volume  is  devoted  to  other  questions  connected  with  the 
creation  and  Introduction  of  automated  information  retrieval  systems 
of  different  classes  and  assignment  for  processing  large  as  well 
as  small  volumes  of  scientific  and  technical  information.' 

The  second  volume,  "Semiotic  Problems  of  Automated  Data 
Processing'*  presents  material  dedicated  to:  development  of  problems 
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of  the  connection  between  syntactic  and  semantic  properties  of 
language  systems;  investigation  of  the  natural  and  formalized 
languages  of  science  and  technology  in  connection  with  problems  of 
storage  and  retrieval  of  information;  questions  of  automatic  processing 
of  tests  for  creating  operational  systems  of  machine  indexing, 
abstracting  and  translation  of  texts;  research  in  the  area  of  creation 
of  special  programming  languages  and  translators  from  them  for  machine 
processing  of  texts. 

The  third  volume,  "Automatic  Reading  Devices”  presents  results  of 
research  and  development  of  different  methods  and  devices  for  automatic 
identification  of  typographical  and  typewritten  symbols.  Special 
attention  is  alloted  here  to  automatic  reading  machines,  allowing  auto¬ 
mated  computer  input  of  masses  of  scientific  and  technical,  statistical 
and  economic  information  to  ETsVM  [electronic  digital  computers]. 

The  fourth  volume,  “Technical  Devices  for  the  Information 
Service  and  on  Line  Reproduction  Techniques”  presents  works 
related  to  research  and  development  of;  technical  devices  for 
preparation  and  input  of  alpha-numeric  information  into  a  computer 
and  high  speed  output  devices  of  textual  information,  and  also 
output  devices  with  many  characters  and  high  quality  print; 
specialized  memory  units  for  information  systems,  possessing 
internal  logic  and  the  possibility  of  storage  of  large  volumes  of 
information.  Including  photoscoplc,  associative,  with  internal 
logic;  retrieval  devices  on  continuous  carriers  (microfilms)  and 
discrete  carriers  (microphoto  cards,  magnetic  cards,  etc.); 
technical  means  for  mechanization  and  automation  of  all  stages  of 
information  processes. 

A  considerable  place  in  this  volume  is  given  to  questions  on 
organization  of  industry  and  improvement  of  technological  processes 
on  output  of  urgent  information  publications  with  wide  application 
of  methods  of  the  classical  printing  industry  and  reproduction 
technology,  the  use  of  latest  models  of  typesetting  typewriters  with 
stored  control  for  manufacture  of  original  mock-ups  suitable  for 
direct  reproduction,  application  of  xerographic,  electronic  equipment 
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for  urgent  manufacture  of  printing  forms,  and  also  questions  of  the 
use  of  high-speed  cylinder  and  offset  machines  for  printing 
information . 


Material  is  given  on  the  use  of  highly  productive  brochure 
e  qu i pme  r  j  t . 
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THE  ROLE  AND  PLACE  OF  THE  MACHINE  IN  SCIENTIFIC 
AND  TECHNICAL  INFORMATION 

A .  I ,  Mikhaylov 

The  " . . .all  possible  assistance  to  further  strengthening  of 
the  role  of  science  in  the  building  of  Communist  society . , . 
standard  statement  of  scientific  and  technical  information,  and 
the  whole  system  of  study  and  propagation  of  domestic  and  foreign 
advanced  experience"  provided  for  by  the  program  of  the  CPSU  found 
its  reflection  during  the  last  few  years  in  a  number  of  resolutions 
of  the  Central  Committee  of  the  Communist  party  of  the  Soviet 
Union  and  the  Council  of  Ministers  of  the  USSR  directed  towards 
creation  in  our  country  of  a  standard  system  of  scientific  and 
technical  information. 

At  the  25rd  Congress  of  the  CPSU  comrade  A.  N.  Kosygin  said: 
"Technical  progress  in  the  national  economy  and  successes  of 
science  to  a  large  extent  depend  on  a  well  supplied  system  of 
information  about  results  of  scientific  investigations  conducted 
in  this  country  and  abroad,  about  the  achievements  and  new  methods 
of  production,  and  about  Inventions  and  proposed  Innovations, 

We  must  create  in  this  country  a  highly  effective  state  system 
of  scientific  information" . 

Recent  years  have  been  distinguished  for  active  activity  in 
the  field  of  development  of  scientific  and  technical  information  in 
all  links  of  the  national  economy. 
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Transition  to  a  new  system  of  leadership  and  planning  of  the 
national  economy  requires  reconstruction  also  in  the  region  of 
state  leadership  of  scientific  and  technical  information.  The 
resolution  of  the  Council  of  Ministers  of  the  USSR  from 
29  November  I966  is  a  program  of  works  directed  towards  creation 
of  a  state  system  of  scientific  and  technical  informatitjn  most 
fully  responding  to  further  development  of  progress  in  our  country. 

During  the  last  few  years  there  has  been  scanned  a  wide  front 
of  development  of  scientific  information.  This  finds  confirmation 
in  a  huge  number  of  publications  and  in  a  large  number  of  inter¬ 
national,  regional,  and  national  scientific  conferences,  symposia, 
conferences,  etc.  The  main  direction  of  these  investigations  is 
the  search  for  ways  to  speed  up  processes  of  information  activity. 

In  spite  of  successes  attained  in  the  region  of  scientific 
investigations  and  design  developments,  in  our  days  the  need  has 
become  evident  critically  to  estimate  results  in  the  region  of 
development  of  scientific  bases  and  new  technical  means  both  from 
the  point  of  view  of  satisfaction  of  appearing  requirements  or 
consumers  of  information  and  from  the  point  of  view  of  the  overall 
solution  of  the  whole  Information  problem. 

A  deficiency  of  scientific  activity  in  the  past  (during  the 
experience  of  the  VINITl) [All-Union  Institute  of  Scientific  and 
Technical  Information]  was  individual  special  problems,  which 
sometimes  hurt  the  development  of  general  theoretical  problems. 

At  present  more  and  more  attention  is  being  paid  to  questions 
of  development  of  the  theory  of  processes  of  scientific  Information. 
And  the  basis  of  information  activity  should  be  the  theory  of 
processes,  not  the  empirical  and  intuitive  method. 

We  do  not  as  yet  have  an  acceptable  name  for  the  scientific 
discipline  which  studies  the  structure  and  properties  of  scientific 
information;  the  regularity  of  information  activity;  and  the  theory, 
history,  and  method  and  organization  of  information.  The  term 
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narrow  and  Inconvetiien t .  It  seems  to  us  that  the  most  suitable 
term  to  use  would  be  the  word  " Informatlka"  [no  translation  found], 

A  sitailar  proposal  is  argued  in  an  article  published  in  No.  12 
"NTI"  [Scientific  and  Technical  Information]  in  1966,  Prom  now 
on  we  \/ill  use  this  term. 

We  put  in  the  idea  "informatlka"  —  in  this  new  discipline  — 
the  following  contents;  the  study  of  the  laws,  methods,  and 
means  of  collectioii;  analytic  and  synthetic  processing;  storage; 
retrieval;  and  propagation  of  scientific  information. 

In  informatlka  mucn  has  already  been  done  to  give  contemporary 
machine  technology  a  fitting  place  in  scientific  and  information 
activity.  However,  the  role  and  place  of  the  machine  in 
informatlka  is  sometimes  understood  to  be  too  simplified,  and  in 
investigations  and  experiments  with  the  use  of  ETsVI*l  not  all  trends 
have  been  sufficiently  developed. 

The  Role  of  ETsVI-1  in  Information  Processes 
‘(The  Time  Factor  Not  Quantitative  Grov/th) 

Formation  of  scientific  and  technical  information  as  an 
independent  scientific  trend  appeared  as  a  result  no’:  only  of  the 
exponential  grov/th  of  scientific  and  technical  literature,  but 
also,  in  addition  to  that  (and  perhaps,  even  mainly),  as  a  result 
of  peculiarities  of  development  of  scientific  and  technical  progress. 
Contemporary  science  and  technology  have  two  characteristic 
peculiarities . 

1.  Ever  increasing  complexity  of  scientific  and  technical 
probd.ems  the  solution  of  which  is  possible  only  througn  the 
efforts  of  a  large  pool  of  scientists  and  engineers  of  various 
specialties.  These  pools  need  to  be  provided  with  information. 
F''oviding  it  takes  on  more  and  more  the  character  of  queueing. 
Effective  solution  of  this  problem  requires  the  application  of  means 
of  rriec'n&nization  and  automa  L  io.n  . 
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2.  Fast  reduction  of  periods  of  development,  mastering,  and 
Introduction  of  various  discoveries  and  Inventions, 

The  president  of  the  Academy  of  Sciences  of  the  USSR, 

M.  V.  Keldysh,  in  the  article  "Natui'al  sciences  and  their  value 
for  development  of  Weltanschauung  and  tech.nlcal  progress" 

( "Kommunist" ,  I966,  No.  17)  writes:  "Those  who  say  that  the 
century  of  science  and  technology  has  Just  now  started  are  not 
entirely  correct,  After  all  In  all  times  material  and  technical 
progress  was  largely  based  on  the  development  of  science.  The 
roots  of  the  Industrial  revolution  also  lay  in  science.  Th.e 
last  century  was  characterized  by  a  very  large  number  of  the  greatest 
scientific  achievements.  It  is  incorrect  to  think  that  earlier 
there  were  few  discoveries  in  the  field  of  natural  science  entailing 
great  consequences  in  the  region  of  material  production  and  that 
we  have  only  now  entered  the  century  of  continuous  discoveries. 

However,  there  is  one  feature  very  characteristic  of  contemporary 
development  of  science  and  technology;  the  speed  of  practical  use 
of  scientific  discoveries.  It  is  possible  to  give  a  number  of 
examples.  The  period  from  the  discovery  of  the  electric  current 
(Galvani)  to  the  creation  of  the  fir-st  ele'^tric  power  station  spans 
about  a  century.  It  took  almost  one;  hur.dred  years  to  master  this 
remarkable  discovery,  having  hugh  pi-ospects.  It  is  possible  also 
to  note  that  seventy  years  passed  from  the  clarification  of  the 
role  of  mineral  fertilizers  in  the  feeding  of  plants  (the  middle 
of  the  last  century)  to  their  intense  use.  And  they  came  into  wide 
use  only  after  the  second  world  war.  The  discovery  of  nuclear 
fission  of  uranium  was  another  story.  Only  three  years  passed  from 
the  moment  of  this  discovery  to  the  creation  of  a  nuclear  reactor, 
and  to  creation  of  the  first  atomic  electric  power  station  15  years. 

The  very  fact  that  our  time  is  characterized  by  extraordinarily 
fast  use  of  achievements  of  science  makes  all  the  more  important 
good  organization  of  scientific  investigations  and  use  of  their 
results  in  production.  Today  not  that  country  which  first  makes  a 
new  scientific  discovery  but  the  one  which  is  better  able  to  organize 
its  fastest  use  in  practice  is  ahead  in  realization  of  the  industrial 
process" , 
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This  peculiarity  can  he  shown  in  a  number  of  examples. 

From  the  nioment  of  discoveries  which  in  the  final  analysis  led  to 
the  appearance  of  pliotograpliy  (the  first  half  of  the  XVIIIth 
century)  to  the  introduction  of  the  means  of  photography  In  practice 
112  years  went  by.  Development  of  means  of  telephone  communication 
took  years,  radio  35  years,  radar  I5  years,  television  12  years, 
and  tlie  transistor  but.  5  years. 

Such  acceleration  of  the  rate  of  investigations  and  developments 
requires  a  corresponding  increase  in  the  speed  of  informat'.lon  systems. 
It  is  clear  that  this  problem  can  be  solved  only  by  way  of  wide 
application  in  information  practice  of  means  of  mechanization  and 
automati on . 

Consequently,  contemporary  oiganizatlon  of  information  activity 
must  not  only  help  the  researcher  to  look  Into  the  rapidly  grovring 
"Himalayas  of  libraries"  but  also  satisfy  the  requirements  of 
users,  emanating  from  rriany  aspects  of  the  development  of  scientific 
thought . 

The  history  of  the  development  of  informatika  gives  us  many 
examples  of  the  futility  of  trying  to  solve  the  problem  of 
scientific  inforrriation  by  way  of  creation  of  complicated  machines 
simulating  informatior;  processes.  A  look  back  at  the  past  in 
American  practice  is  enough  to  convince  anyone  of  that.  Remembe^r 
the  unrealizable  hopes  connected  with  the  creation  of  such 
information-retrieval  devices  as  "Rapid  selector",  "File  search", 
"Minicard"  and  o'hers.  In  195b  we  attended  an  American  exhibition 
organized  by  the  MFD  [International  Federation  for  Documentation] 
conference  in  the  United  States,  where  ther’e  was  represented  a 
wide  range  of  modern  machines  v/hich  were  not  widely  used  in 
Ir.formation  practice.  But  at  the  same  time  work  on  such  devices 
was  useful  at  least  in  tvfo  respects.  It  permitted  understanding 
that  the  basic  difficulties  of  mechanization  and  automation  of 
information  processes  consist  not  in  the  absence  of  necessary 
technical  means  tut  in  the  fact  the  internal  mechanisms  of  fulfillment 
of  such  processes  have  not  been,  studied  by  man.  And  wheri  we  ere  not 
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clear  on  tlie  mectianjcin  of  full' lllmerit  of  an  operation  b,v  man,  we 
can  only  Imitate  this  process  but  not.  simulate  it. 

Another  useful  resul  1  of  works  connected  w:  tii  creation  ol’ 
information-retrieval  devices  was  accumulation  of  valuable 
scientific  and  teciinical  experience  used  more  a;.d  more  in  varit'us 
spheres  of  tiie  natioiial  economy. 

Speaking  of  the  place  of  tije  machine  in  information  activity, 

1 t  is  impossible  not  to  remember  tne  definite  evolution  of  views 
of  specialists  with  respect  to  machine  translation. 

If  in  the  first  half  of  the  fifties  many  researchers  flippantly 
promised  literally  in  a  few  years  to  replace  the  human  translator 
with  a  machine,  then  subsequently  their  groundless  optimism  was 
replaced  by  a  sober  understanding  of  the  exceptional  difficulty 
of  the  problem  before  uS .  Attempts  immediately  "to  take  the  bull 
by  the  horns"  and  already  now  realize  machine  translation  in 
practice  were  replaced  by  deep  theoretical  researcti.  Now  there 
will  hardly  be  found  a  serious  researcher  who  will  begin  to  affirm 
that  the  problem,  of  machine  translation  (in  its  strict  sense)  can 
be  solved  in  the  next  years.  It  Is  true  that  experts  of  llie 
"brain"  corporation,  "Rand,"  who  are  specially  engaged  in  the 
composition  of  forecasts  of  development  of  science  and  technology 
in  the  next  50  years  express  a  rather  optimistic  point  of  view  — 
they  expect  macnlne  translation  to  become  a  reality  already  by 
1970.  Regarding,  however,  an  automatic  information  center, 
they  predict  its  appearance  only  in  the  period  between  1970  and  1990. 

Consequently,  there  appears  the  question  of  our  relationship 
to  the  problems  of  machine  translation.  We  have  before  us  the 
very  complicated  problem  of  machine  automatic  reviewing  and  indexing. 

It  is  possible  to  conclude  this  from  reports  which  were  lead 
at  the  UNESCO  seminar  on  the  problem  of  automatic  reviewing  and 
Indexing  held  in  Moscow  in  September.  I966. 


The  scientific  asf'ect  of  the  problem  Is  reailza'Cion  oi  auLomaLic 
review]  nf^j  and  Indexlnn  in  the  strict  sense  of  these  terms.  This 
meat;s  that  the  unprepared  text  of  tlie  scientific  document  —  book, 
article,  patent,  etc.,  is  reviewed.  The  procedures  of  automatic 
revievting  can  be  pictured,  for  example,  in  the  following  way: 

a)  translatior*  of  the  text  of  the  document  into  a  certain 
formalized  language; 

b)  exposure  of  the  main  subject  of  this  document  and 
expression  of  the  given  subject  in  the  fornalized  language; 

c)  translation  of  the  abstract  from  formalized  to  natural 
language . 

If  the  problem  of  automatic  reviev;ing  is  solved,  (the  most 
Important  and  difficult  problem  here  is  algorithmic  e"posure  of 
the  mein  subject  of  the  scientific  document),  then  automatic 
indexing  becomes  practically  realizable.  For  this  it  is  sufficient 
to  translate  the  abstract  from  the  formalized  language  into  another 
formalized  language  utilized  in  the  information-retrieval  system. 

Solutioci  of  this  bunch  of  problems  is  possible  only  after 
carrying  out  deep  fundamental  investigations  in  many  branches  of 
science  —  in  linguistics,  psychology,  methematicel  logic, 
semantics,  semiotics,  etc.  These  investigations  should  open  slightly 
tlie  curtain  above  the  secret  of  huir,a.n  thinking,  vfnich  presents 
general  scientific  interest.  Such  investigations  are  so  complicated 
that  there  is  hardly  ariy'-  reason  to  expect  considerable  results  in 
the  next  fev;  years.  But  there  can  be  no  doubt  that  the  investigations 
must  be  expanded  and  deepened.  It  is  necessary  to  pay  attention 
to  the  fact  that  without  solution  of  the  problem  of  machine 
translation  v.'e  will  not  solve  the  problem  of  automatic  reviewing 
and  indexing.  For  execution  of  this  vital  problem  there  are  useful 
and  important  all  methods,  including  automatic  quasi-reviewing 
in  all  its  varieties  and  automatic  indexing  of  abstracts  of  scientific 
documents.  Of  course  here  in  the  first  place  it  is  necessary  to 
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consider  such  factors  as  comparative  cost  of  processing  of  texts, 
obtained  gain  in  tiir.e,  etc. 

It  is  necessary  specially  to  stress  that  development  and 
application  of  methods  intended  for  solution  of  such  purely 
pragratic  problems  by  no  means  signifies  solution  o*"  the  problem 
of  automatic  reviev.-ing  arid  indexing  in  its  strict  sense,  although 
it  will  promote  this  . 

Thus,  the  examined  problem  has  tv;o  aspects  —  scientific  and 
practical.  Both  these  aspects  are  important,  but  successes  in  the 
solution  of  the  practical  aspect  of  this  p^robleiri  must  not  be  taken 
for  the  solution  of  its  scientific  aspect.  A  clear  understanding 
of  this  distinction  is  a  necessary  condition  of  the  success  of  our 
further  work  in  this  important  direction. 

Solution  of  the  problem  of  automatic  readout  of  texts  is 
expecially  important  in  connection  with  this,  VJithout  automatic 
reading  devices  not  only  can  no  practical  system,  of  automatic 
reviewing  and  indexing  be  realized,  but  investigations  in  sufficiently 
wide  scales  in  this  region  are  impossible.  It  is  necessary  to 
remember  that  all  proposals  to  use  punched  tape  obtained  from 
f lexowriters ,  monotypes,  etc,,  for  machine  input  and  other 
analogous  proposals  have  no  bearing  on  the  probleiri  of  automatic 
reading  of  texts,  althougn  they  are  very  useful  in  the  practical 
plan . 


The  side  of  the  problem  of  automatic  revlev;ing  and  indexing 
is  the  problem  of  satisfaction  of  the  information  needs  of  scientists 
and  engineers  today,  and  not  sometime  in  the  future. 

One  of  the  trends  of  scientific  investl'gations  v/hich  must 
be  developed  before  the  'iroblerr.s  of  informatika  can  be  solved  is 
undoubtedly  rsemlotics.  i-eation  of  artificial  formalized  languages 
of  science  and  technology  and  also  Languages  of  generalized 
programming  is  one  of  the  m.ost  important  conditions  of  automation  of 
information  activity.  Solution  of  the  central  problems  structural 
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linguistics,  machine  translation  and  decoding  of  ancient  written 
language  v.'ill  rrake  it  considerably  easier  for  us  to  solve  the 
problem  of  automatic  analysis  of  the  contents  of  scientific  documents. 

Semiotics  are  useful  and  neces-^ary.  Those  of  you  who  are 
acquainted  with  reports  of  it  can  evaluate  its  achievements  yourselves. 
In  connection  with  this  I  would  like  to  make  just  one  remark. 

It  is  assumed  that  results  of  semiotic  investigations  will  find 
practical  application  in  information  activity  in  the  future,  when 
the  basic  problems  of  automatic  analysis  of  text  h.ave  been  solved. 

Meanv’hile ,  w'e  cannot  wait  this  time.  The  recent  resolution 
of  the  government  obliges  us  to  create  socn  an  effective  state 
system  of  scientific  and  technical  information.  This  is  why  it 
is  absolutely  necessary  that  semiotics  already  now  take  part  in 
a  number  of  concrete  works  on  creation  of  information-retrieval 
systems  and  on  the  study  of  information  needs  of  specialists, 
improvement  of  methods  of  processing  documents,  and  regulating 
of  informatio.n  flows.  This  v;ill  not  only  unite  our  efforts  in 
solving  pressing  problemis  of  inforrratika  but  will  also  make  the 
investigations  of  semiotics  itself  miove  purposeful. 

One  m'ore  imiportant  trend  in  informatiko,  appeared  in  connection 
with  wide  application  of  machine  technology.  I  have  in  mdnd  study 
of  a  systemi  of  scientific  publications.  Until  lately  the  opinion 
v*9. s  ’ths.'t  j D forriH ^ icr  cris'^s  2.r*oss  in  ccpnGCiion  v/iin 

the  stormiv  gro’wth  of  scientific  cuiblications  .  Authoritative 
scientists  repeatedly  expressed  the  opinion  that  publication  of 
scientific  literature  needed  to  be  limdted  and  regulated.  As 
exaiT;ples  it  is  possible  to  refer  to  the  well-known  project  of 
J  .  Hernal  proposing  replacemient  of  the  contemporary  system  of 
scientific  journals  or  the  recent  appearance  in  "izvestia"  of 
academician  Y.  A.  Kargin  calling  for  limitation  of  the  number  of 
published  sources. 

i:ov;ever,  investigations  conducted  at  present  abroad  obviously 
show  that  v.'e  barely  knov;  the  internal  regularities  of  a  system  oi' 
scientific  publications.  Being  basic  mieans  of  t ransrriission  of 
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scientific  information  in  time  and  over  distance,  scientific 
documents  obey  certain  objective  laws,  the  neglect  of  which  miakes 
it  Impossible  correctly  to  plan  information  activity  and  ai.l  the 
more  so  to  improve  the  system  of  publications. 

Long  ago  attempts  were  made  to  study  this  question  on  the 
basis  of  calculation  of  quoted  literature  and  tracing  of  connections 
between  documents  forming  by  means  of  bibliographic  references. 
However,  only  after  this  method  was  reinforced  by  the  use  of  ETs\n.i, 
did  it  begin  to  give  perceptible  results.  V/e  must  scan  the 
corresponding  investigations  both  with  respect  to  domestic  and 
v;ith  respect  to  foreign  publications.  Study  of  the  lav/s  of 
scattering  and  aging  of  publications  ana  creation  of  inf ormiat ion 
systems  based  on  the  method  of  bibliographic  combination  of 
docurrients  (in  particular,  indicators  of  quoted  literature),  are 
among  the  first  steps  which  iriust  be  taken. 

*  * 

* 

The  purpose  of  this  report  is  to  show  the  place  of  the 
machine  and  its  role  in  inf orrriation  activity.  It  seems  to  us  that 
none  of  us  needs  to  be  convinced  of  the  fact  that  the  road  to 
speeding  up  all  information  processes  can  be  built  only  by  the 
machine-automation  in  combination  with  the  intellectual  labor  of 
man,  necessary  in  all  cases.  Foi'mation  of  the  requirements  of 
such  machine  technology  must  be  approached  from  scientif ically 
v.'sll-founded  positions.  Vie  are  faced  with  two  large  independent, 
internally  interconnected  problems. 

The  first  concerns  centralized  processing  of  scientific  and 
technical  literature. 

Conditionally  it  is  possible  to  concieve  of  this  problem  as 
an  information  system  at  the  input  of  which  is  a  flow  of  world 
literature.  After  processing  at  the  output  of  this  conditional 
system  we  have  blbliographic-sig.nal  informa.ticn  and  series  of 
abstract  Journals  with  a  system  of  various  indicators. 
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Here  the  role  of  ETsVT-I  and  its  value  is  distinctly  seen. 


An  acute  problem  remaining  is  creation  of  automatic  input  of 
printed  text  and  satisfactory  output  from  ETsVM  of  ready  printed 
form  for  reproduction  without  typesetting. 

An  exceptional  place  is  occupied  by  magnifying,  highly  productive 
equipment  for  printing  v.'ith  and  without  typesetting.  But  clarity 
in  these  individual  questions  does  not  I'emove  the  need  to  develop 
the  vfide  front  of  research  vforks  spoken  of  in  the  beginning. 

The  second  problem  concerns  the  entire  group  of  complicated 
questions  connected  with  the  creation  of  informatio.n-retrieval 
systems .  Here  great  is  the  role  of  ETsVK  as  a  means  of  helping  to 
conduct  experimental  retrieval  i^orks  in  large  volumes  and  in 
shorter  periods, 

A  number  of  experimental  retrieval  systems  have  already  been 
created  in  our  country  in  the  last  few  years.  All  of  them  are 
built  on  different  principles,  i.e,,  they  are  fragmental. 

IS  it  possible  already'  today'  to  consider  that  these  works  have 
completely  cleared  up  the  whole  problem?  It  seems  to  us,  not  yet. 

But  at  the  same  time  these  'works  are  accumulating  very  interesting 
facts  w'hich  are  difficult  to  reevaluate.  The  sum  of  these  facts 
v.'ill  surely  help  correctly  approach  the  creation  of  a  state, 
centralized,  or  coordinated  information-retrieval  system. 

In  conclusion  I  v.'ould  like  to  stress  again  and  again  the 
extreme  need  to  develop  a  wide  front  of  theoretical  and  experim.ental 
v.'orks  illuminating  the  problem  of  creatio.n  of  information-retrieval 
sy'stems  , 


It  is  possible  to  say  with  confidence  that  an  increasing  staf+' 
of  specialisc.s  in  the  field  of  scientific  information  will  make  their 
contrit'ut j.on  to  the  great  5Cth  aniversary  of  the  October  Revolution. 
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BASIC  TRENDS  IN  THE  DEl’ELOPMENT  OF  INFORKATIOK- 
RETRIEVAL  SYSTEMS 


D.  I.  Voskoboynlk ,  G.  E,  Vleduts  ar,d 
V.  S.  Chernyavskiy 

Development  of  information-retrieval  systems  (IPS)  at  present 
has  taken  a  rather  wide  swing  in  connection  with  the  growing  growth 
of  information  with  respect  to  all  branches  of  sciences  and  techno. ogy 
and  increasing  requirements  of  consumers  of  information  on  periods 
cf  information  service  and  also  in  ccnnecticn  with  acceleration  cf  the 
rate  of  technical  progress.  Several  years  ago  consumers  of  informaticn 
could  be  satisfied  by  retrieval  of  materials  witn  the  help  of  the 
simplest  IPS  [information-retrieval  system.sl  up  to  nonmechani^.ed 
IPS  of  the  type  or  library-bibliographic  classifications. 

Now  such  system.s  can  no  longer  satisfy  the  demands  cf  scientific 
and  engineering-technical  workers.  Volumes  of  reference  and 
information  funds  considerably  grew,  and  interbranch  problems 
represented  by  multiaspect  materials  becam.e  ever  more  important. 

In  connection  with  this  there  were  complicated  requirements  for 
retrieval  systems,  which  now  have  to  select  materials  from  huge 
arrays  v;ith  much  more  exact  regard  for  various  semantic  aspects  of 
these  materials.  EVTsM  technology  was  called  in  to  help.  Its  high 
speed  was  generally  known  and  had  already  given  a  number  of  pr'actically 
perceptible  results  in  various  regions  of  hiumari  activity. 
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thoroughness  and  minimum  noise  it  turned  out  to  be  necessary  to 
develop  special  information-retrieval  languages  (IPYa)  which  would 
allow  recording  information  in  E'.'TsM  in  a  form  useful  for  algorithmic 
processing. 

Already  in  the  early  works  of  V.  P.  Cherenin  [1]  and 
V,  A.  Uspenskiy  [2]  dedicated  to  theoretical  analysis  of  ways  of 
developing  IPS,  the  authors  assumed  that  formalized  IPYa  determine 
potential  semantic  and  logic  possibilities  of  IPS  independently  of 
various  methods  of  their  technical  realization. 

However,  this  fundamental  position  very  important  for  development 
of  all  IPS  problems  became  v/idely  acknowledged  only  during  the  last 
2~3  years,  especially  after  the  II  All-Unlon  Conference  on  Information- 
Retrieval  Systems  and  Automatic  Data  Processing.  In  general,  one 
should  note  that  the  period  which  passed  after  the  II  All-Union 
Conference  is  characterized  not  only  by  considerable  growth  in  the 
number  of  works  in  the  IPS  region,  but  also  by  essential  increase  in 
their  scientific  level.  In  this  period  trends  also  developed  v.’hlch 
v;ere  totally  unrepresented  earlier.  Therefore,  v/e  can  now  present 
a  more  or  less  v/hole  and  systematized  picture  of  the  whole  front  of 
domestic  dev'elopm.ent.r  in  the  IPS  region  during  the  last  few  years. 

In  the  development  of  information-retrieval  lar'uages  from  the 
very  beginning  it  is  possible  to  trace  tv^o  branches;  IPYa  for 
factographic  IPS  and  IPYa  for  documentographic  IPS. 

Let  us  consider  in  the  beginning  the  first  branch. 

As  v/as  shown  in  the  v?ork  of  V.  A,  Uspenskiy,  the  v.'ay  to  cresi  e 
sufficiently  rich  formalized  languages  for  recording  facts  of  natural 
sciences  is  through  development  of  the  metatheory  of  these  sciences. 

In  it  there  are  basic  forms  of  objects  and  relations  corresponding 
to  the  elementary  ideas  of  the  given  discipline  and  methods  of 
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construction  of  complicated  ideas  from  elementary  ideas  and  the 
reasoning  procedures  applied  in  the  given  region  are  investigated. 

On  the  basis  of  created  metatheories  formalized  languages  are 
constructed  as  methods  of  recording  facts  with  the  use  of  the 
symbolism  of  mathematical  logic.  Formalized  languages  thus  built 
can  be  used  in  systems  automating  not  only  information  retrieval,  but 
also  processes  of  forecasting  new  facts  in  so-called  "information- 
logic"  systems. 

Examples  of  particular  formalized  languages  created  on  the  basis 
of  the  above-mentioned  metatheoretical-logical  approach  are  formalized 
languages  developed  for  geom.etry  and  organxc  chemistry.  Information- 
logic  language  built  on  the  basis  of  narrow  predicate  calculus  by 
A.  V.  Kuznetsov,  Ye,  V.  Paducheva,  and  N.  M.  Yermolayeva  [3]  for 
geometry  was  subsequently  a  successful  object  for  investigations 
carried  out  by  Ye.  V.  Paduchevoy  [4]  of  the  problems  of  translation 
from  information-logic  languages  to  Riussian.  G.  E.  Vleduts  and 
V,  K,  Firm  [5]  developed  information-logic  languftge  for  structural 
organic  chemistry.  The  mentioned  formalized  language  was  assumed  as 
a  basis  of  development  for  the  field  of  chemistry  of  the  big 
information-logic  system.  In  this  system  units  of  Information  are 
properties  of  chemical  compounds  and  processes  o:'  their  mutual 
transformation,  i.e.,  chemical  reactions.  For  machine  recording 
of  the  structure  of  organic  compounds  and  equations  of  organic 
reactions  there  are  used  atomic  (topological)  linear  recordings 
of  chemical  graphs,  and  also  several  different  systems  of  so-called 
"filter  recordings,"  reflecting  with  various  degrees  of  detail  the 
basic  peculiarities  of  structure  of  compounds  and  chemism  of  reactions. 

For  the  putting  into  the  system  of  nonlinear  chemical  structural 
information  there  have  been  developed  special  systems  of  primary 
coding  close  to  the  nomenclature  language  and  designations  accepted 
in  chemistry. 

I,eaning  on  the  peculiarities  of  formalized  language  utilized  in 
the  system  it  was  possible  to  formulate  algorithms  of  solution  of  a 
number  of  information-logic  problems  from  the  field  of  chemistry,  in 
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particular,  a  probable  forecast  of  reactivity  and  ways  of  synthesis 
of  organic  co:;'.pounds . 


Other  examples  of  factographlc  IPS  are  the  system  for  inorganic 
chemistry  developed  by  A.  L.  Seyfer  and  his  colleagues  [6],  in  which 
the  chief  attention  is  focused  on  recording  and  retrieval  of  properties 
of  compounds  expressed  in  numerical  form  and  also  IPS  for  physico¬ 
chemical  systems  with  tv;o  or  more  components  based  on  formalized 
language  of  physical-chemistry  phase  diagrams. 

During  construction  of  information  languages,  form.ing  the  basis 
of  factographlc  IPS,  there  was  more  or  less  carried  out  the  above- 
mentioned  metatheoretical  approach  to  formalization  of  the  language 
of  the  corresponding  fields  of  science. 

Another  approach  to  construction  for  sufficiently  broad  fields 
of  science  and  technology  of  informatidn  languages  of  the  so-called 
"descriptor"  t^/pe  is  based  on  study  of  vocabulary  specific  for  the 
natural  language  of  the  corresponding  branch.  It  is  accepted  practice 
to  call  terminological  dictionaries  thus  compiled  of  words  and  v.'ord 
combinations  grouped  in  classes  of  term-descriptors  sufficiently 
close  or  equal  in  meaning  "thesauri"  or  descriptor  dictionaries. 

In  thesauri,  besides  exposure  of  the  relationships  of  svTionymiy 
and  homonymy  of  terms,  there  are  also  exposed  semantic  relationships 
existing  betv.'een  descriptors.  These  relationships  reflecting  objective 
relationships  taking  place  In  the  examined  subject  field  are  custom¬ 
arily  called  "basic"  relationships  according  to  the  terminology 
Introduced  by  S.  Chernyavskiy , 

Descriptor  languages  are  vridely  used  in  document ographic  IPS. 

An  example  of  a  descriptor  dictionary  for  large-scale  documentographic 
IPS  in  the  field  of  applied  chemistry  and  the  chemical  industry  is 
the  thesaurus  developed  under  the  leadership  of  V,  3.  Margai'ltov, 

The  retrieval  forms  of  documents  on  descriptor  inf ormatlon- 
retrieval  languages  without  grammar  have  the  form  of  simple  sets  of 
descriptors.  Use  of  such  languages  essentially  simplifies 
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algorithmization  of  the  process  of  translation  from  natural  to 
formalized  language,  which  in  this  case  essentially  boils  down  to 
word  for  word  comparison  of  the  text  of  the  document  in  natural 
language  with  the  thesaurus;  difficulties  connected  with  recognition 
of  homonymy  are  easy  to  solve  by  lexical  analysis  of  contextual 
encirclement  of  homonymic  words.  This  was  shown  by  V.  S.  Chernyavskiy 
and  his  colleagues  on  an  example  of  translation  of  texts  of  abstracts 
on  electrical  engineering  from  Russian  to  descriptor  language  without 
grammar.  A  language  of  such  type,  vjith  the  establishing  of  basic 
relationships  between  descriptors,  is  assumed  by  V.  S.  Chernyavskiy, 

D.  G.  Lakhuti  and  E.  S.  Bernstein  [7]  as  a  basis  of  the  " Pusto-Nepusto" 
experimental  IPS  developed  by  them  for  the  field  of  electrical 
engineering.  In  particular,  the  "Risto-Kepusto"  IPS  in  which  documents 
are  automatically  indexed  according  to  the  above-mentioned  principle. 

Is  realized  with  the  help  of  the  "Mlnsk-22"  EVTsM. 

There  have  been  developed  information  languages  of  the  descriptor 
type  for  TPS  intended  for  separate,  more  or  less  wide  regions  of 
technology,  in  particular,  for  a  number  of  subfields  of  radio 
electronics,  tractor  construction,  machine-tool  building,  and  others. 

In  the  Central  Institute  of  Patent  Information  thematic  dictionaries 
for  corresponding  divisions  of  the  patent  fund  are  compiled 
simultaneously  with  development  of  the  overa.Tl  experimental  system 
of  machine  translation  and  automatic  indexing. 

As  we  have  already  mentioned  above,  retrieval  samples  of  documents 
in  descriptor  languages  without  grammar  have  the  form,  of  simple  sets 
of  descriptors.  Because  of  the  absence  in  these  languages  of  grammati¬ 
cal  means  such  retrieval  samples  cannot  reflect  contextual 
relationships  into  which  seiriantic  units  corresponding  to  descriptors 
in  concrete  text  enter.  This  deficiency  leads  during  algorithmic 
retrieval  to  increase  in  the  percentage  of  unnecessary  delivery, 
l.e.,  to  rise  in  the  number  of  Issued  IPS  documents  not  satisfying 
the  inquiry.  To  decrease  this  "retrieval  noise"  descriptor 
languages  are  given  grammar  elements,  the  simplest  of  which  are 
so-called  "role  indicators,"  which  are  supplied  to  individual 
descriptors  in  retrieval  form  for  the  purpose  of  defining  more 
accurately  their  menaing  in  the  examined  context.  A,  V.  Sokolov 
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proposed  using  three  role  indicators  the  first  of  which  v;ould 
separate  one  or  more  main  descriptors  corresponding  to  the  main 
subject  or  subjects  of  consideration.  This  role  indicator 
distinguishes  the  main  descriptors  playing  the  role  of  "subjects"  from 
the  remaining  descriptors  fulfilling  explicative  functions  of 
"adjectives."  Furthermore,  ri'>le  indicators  of  negation  (absence) 
an'  multiplicity  are  used. 

At  the  Institute  of  Cybernetics  of  the  Academy  of  Sciences  of 
the  Ukrainian  SGR  under  the  leadership  of  p.  Skorokhod'ko  [8]  an 
IPS  has  been  developed  for  the  field  of  •  om.puter  technology  based 
on  descriptor  language,  in  which  well-aeveloped  grammatical  means 
are  used.  In  this  language  from  a  comparatively  small  number  of 
base  (elementary)  terms  with  the  help  of  a  limited  number  of  base 
related  themes  fulfilling  the  role  of  one-place  and  two-place  predi¬ 
cates,  more  complicated  terms  of  different  rank  are  constructed. 
"Scannings"  of  complicated  terms  thus  obtained  reflect  the  structure 
of  definitions  of  complicated  jdeas  with  the  help  of  initial 
elementary  ideas  corresponding  to  the  base  terms.  Scannings  of 
terms  assign  basic  semantic  relatioi'.shlps  between  term.s,  on  the 
basis  of  which  semantic  corresponden'^e  betw’een  retrieval  patterns 
of  documents  and  inquiries  are  established.  Related  terms  are  used 
also  to  transmit  textual  relationships  between  terms  in  retrieval 

For  the  field  of  synthetic  organic  chemistry  U.  A.  Stokolova 
and  D.  G.  Lakhuti  I'GJ  built  a  descriptor  language  w'ith  grammar  in 
which  s:,Titactical  connections  are  expressed  by  the  method  of 
"standard  phrases."  A  standard  phrase  is  a  m.eans  of  type  of 
rrultiplace  uredicate  the  places  of  which  are  filled  with  term- 
descriptors,  Every  form  of  standard  phrase  us  used  for  recording 
a  certain  type  of  information  and  the  place,  occupied  by  the  term  in 
the  standard  phrase  and  strictly  determines;  its  function  5.n  context, 
i'o  record  in  Information  language  texts  ccv-stituting  titles  of 
abstracts  of  articles  from  the  examined  fl  Id  it  turned  out  to  be 
sufficient  to  use  three  forms  of  standard  t  irases  of  various  degrees 
of  complexity.  The  basic  relations’nlps  bet ''een  terms  are  expressed 
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In  the  forui  of  classi’^icatlon  diagrams;  several  different  forms  of 
relatloiiPtjj  f.f  are  used.  Experiments  in  thematic  retrieval  of 
information  vlth  the  use  of  this  language  showed  that  very  high  values 
of  coefficients  of  accuracy  aiid  thoroughness  of  retrieval  are  attained 
in  this  way. 

It  is  necessary  to  stress  that  from  the  point  of  view  of 
ope.  tional  qualities  of  IPS,  Vcsldes  the  semantic  force  of  information 
languages  utilized  in  tnem,  peculiarities  of  methods  of  realization 
of  these  systems  and,  in  partucvlar,  such  methods  of  organization  of 
information  array .s  as  the  direct  end  inverse  methods  are  important. 

From  this  point  of  view  one  should  note  development  of  the  method  of 
associative  programming,  v/hzch  permits  realizing  the  associative- 
address  method  of  organization,  combitting  a  number  of  positive 
features  of  both  the  direct  method  an.l  tne  inverse  methed.  This 
method  is  now  reel!  zed  in  the  "getKa-5"  IPS,,  which  services  a  number 
of  branches  of  radio  slectron.lcs,  and  in  IPS  on  materials  of 
cardiovascular  surgery. 

To  evaluate  the  Influence  of  various  struct'  rl  elements  of  IPS 
on  their  operational  parameters,  experimental  stv,..ies  comparing  the 
efficiencies  of  '-'arious  systems  have  positive  value.  Such  a  v/ork.. 

In  which  an  exnarirr.enta.l  .•nethed  was  v;crked  out  and  the  efficiency  of 
traditional  IPS  (based  on  library  methods  of  classif ica'tion  and 
cataloging)  was  cc.nparod  with  that  of  descriptor  IP.S  was  carried  out 
at  the  Leningrad  Institute  ci  Culture  cn  the  inlsiative  of 
A.  V.  Sokolov  [10]. 

To  further  improve  IPS  it  is  necessary  to  intensify  theoretical 
research,  which  will  allow  comprehending,  correctly  estimating,  and 
improving  the  irieans  of  expression  of  information  languages.  This,  of 
course,  is  impossible  without  thorough  comparison  oi‘  the  peculiarities 
of  the  means  of  expression  of  artificial  and  natural  languages.  From 
this  point  of  view  v/orks  already  mentioned  on  automatic  indexing  have 
great  value,  i.e.,  automatic  translation  from  natural  languages  into 
various  kinds  of  information  languages,  and  also  works  close  to 
them  on  automacic  reviewing  and  automatic  composition  of  subject 


indicators.  An  important  general  trait  uniting  all  these  processes 
is  compression  of  information,  i.e.,  transition  to  the  most  important 
characteristics  of  the  meaning  of  the  document. 


The  most  interesting  results  in  the  field  of  automatic  indexing 
(cataloging)  are  obtained  during  overall  use  of  various  principles, 
in  particular,  during  combination  of  means  of  the  following  three 
different  types:  1)  lexical  means  in  the  form  of  precomposed  lists 
of  words  (terms)  significant  or  Insignificant  for  the  given  field, 

2)  statistical  data  on  the  particularity  of  different  words  (terms) 
in  the  examined  text  or  in  the  totality  of  texts  from  the  given  region 
and  3)  results  of  syntactical  or  another  kind  of  formal  structural- 
linguistic  analysis  utilized  for  identification  of  semantic  categories. 

In  the  first  works  on  questions  of  automatic  reviewing  [11,  12] 
there  were  investigated  diverse  variants  of  purely  statistical 
procedures  of  sampling  weighted  phrases  of  reviewed  text.  Recently 
there  have  been  developed  methods  combining  statistical  methods  with 
the  other  means  mentioned  above.  Thus,  in  the  work  of  V.  M.  Gorobtsov 
[13]  on  automatic  cataloging  along  with  frequency  considerations 
there  is  considered  entry  of  words  both  into  descriptor  dictionaries 
of  the  v.'hole  field  and  into  descriptor  dictionaries  complied  for 
texts  pertaining  to  various  subject  classes  and  also  certain 
graimnatical  characteristics  of  words.  In  the  end,  the  text  is 
classed  under  a  certain  subject  h.eading  v/ith  a  known  degree  of 
probability. 

In  the  work  of  11.  I.  Styazhkin  and  his  colleagues  [1^]  after 
translation  of  i.he  text  of  the  document  into  descriptor  language  the 
phrases  containing  the  greatest  number  of  descriptors  encountered 
in  the  title  of  the  article  and  encountered  jointly  with  the 
descriptors  of  the  titles  are  selected:  in  this  v;ay  there  are 
obtained  "author's  abstracts"^  of  acceptable  quality  in  the  sense 
that  deviation  betvjeen  them  and  "abstracts"  composed  by  people  (by 
v'ay  of  sampling  phrases  taken  from  reviev.'ed  text)  is  not  too  great. 


^This  term  is  used  here  in  the  sense  of  a  "paper"  composed 
automatically. 
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In  the  work  of  E,  V,  Yakushir,  [1'3]  the  basis  for  algorithmization  j 

of  composition  of  subject  recordings  is  a  certain  list  of  base  terms  J 

fixed  for  the  given  field.  For  exposure  in  the  text  of  other  words 
explaining  these  base  terms  and  forming  together  v;ith  them  "calling 
pairs"  which  are  used  as  subject  recordings,  grarrjnatical  (syntactical) 
criteria  are  attracted.  By  this  method  subject  recordings  fully 
comparable  with  subject  recordings  composed  by  indexers  are  obtained. 

Along  the  line  of  use. of  linguistic  means  is  the  work  of 
I.  P.  Sevbo  ri6],  in  which  the  resulus  of  a  complete  sj.nractical 
analysis  in  a  certain  sequen'^e  of  phrases  are  used:  here  not  only 
separating  of  certain  semantically  weighted  sections  of  phrases 
(nominal  groups)  occurs,  but  procedures  of  their  unification  in 
chains  connected  in  meaning  also  appear,:  tne  result  (an  abstract  of 
the  annotation  type)  is  a  list  of  names  on  which  the  text  is  discussed. 

Comparing  the  level  of  domestic  and  foreign  works  over  the  whole 
complex  of  questions  pertaining  to  formalized  languages  cf  science 
and  technology,  and,  in  particular,,  the  state  of  theoretical 
research  in  this  field,  '.t  is  possible  to  note  that  ve  have  serious 
achievements  creating  favorable  prerequisites  for  snccessful  advance 
forward , 

Further  progress  in  development  of  both  methods  of  construction 
of  .  ic’i'i  enough  Inforniat’o  .'.ariguages  and  nsethods  of  translation 
ir.to  them  from  natural  languages  and  back  requires  essential 
deepening,  expansion,  ar-.d  approach  of  fundamental  structural- 
linguistic  and  logic  investigations.  From  this  point  of  vlev;  research 
in  creation  of  sema.  ;ic  lrform:aticn  theory  is  paramount;  important 
steps  in  this  direction  are  made  in  works  of  Yu.  A.  Shreyder  [17]  by 
way  of  generalization  and  deepening  of  t'n'='  thesauris  concent. 

It  is  necessary  to  allot  much  attention  to  investigations  directed 
towards  automation  oi'  individual  stages  of  processes  of  creatloii  of 
IF3,  in  the  first  placs  automatic  composition  cf  the.sauri.  For  these 
purposes  It  is  necessar.y  to  more  widely  appJ.y 
date  stored  iri  big  information  funds. 


statistical  methods  to 


Extraordinarily  important  problems  appear  in  connection  with 
providing  IPS  v/ith  machine  technology.  The  main  difficulty  here  is 
caused  by  the  comparatively  low  class  of  series  domestic  EVT’sM: 
insufficient  volumes  of  storage  units,  poorly  developed  parallelism 
of  devices,  and  lov;  reliability  —  especially  of  external  devices. 

These  deficiencies  lead  to  the  need  rigidly  to  save  machine 
operations  and  memory  volumes  in  the  process  of  programming,  which 
in  ■^crn  prevents  standardization  and  automation  of  programming.  In 
the  end  prograrn.ming  of  even  not  very  complicated  algorithms  is  turned 
into  time-consuming  work.  At  present  prograrriming  is  frequently  the 
cause  of  prolonged  delays  in  carrying  out  necessary  experiments  and 
in  realization  of  already  developed  IPS. 

Ic  will  be  possible  to  change  this  position,  if  industry 
schedules  E’/TsM,  at  the  level  of  contemporary  average  world  standards. 

Although  we  carjiot  at  present  completely  answer  the  question  of 
optimum  parameters  of  EMTsII  intended  for  i.nformation  or  seiriantic 
purposes,  accumulated  experience  permits  formulating  certain 
unconditionally  necessary  requirements. 

Speed  is  not  a  critical  parameter,  but  it  is  desirable  to  bring 
the  number  of  operations  per  second  up  i.  at  least  iOO  thousand. 

Capacity  of  storage  units  —  fasu  store  should  be  brought  up  to 
52  thousand  words,  and  the  capacity  external  niemory  should  be 
brought  up  to  several  billion  bits. 

Types  of  storage  units  —  access  to  external  memory  should  be 
facilitated  and  it  is  desirable  to  have  drums  or  disks  or  sufficiently 
convenient  tape  units.  External  rr'einory  in  vfnich  at  least  with  respect 
to  one  process  —  readout  •  -  access  time  v;ould  be  the  same  order  as  in 
fast  store  is  very  useful.  All  these  devices  are  developed  at 
cloiuestic  6-  ’  arprlses,  bu^  their  introduction  and  issue  lag. 
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Input  and  output  —  the  presence. of  many  simultaneously  operating 
input  and  output  devices  is  absolutely  necessary,  in  particular  readers 
and  multlalphabetlc  high-speed  output  vinits  of  the  photosetting-machine 
type . 

Parallelism  —  Independence  and  parallelism  of  functioning  of 
all  machine  units  are  necessary  and  also  the  possibility  of  operating 
in  the  multiprogramming  mode. 

These  will  be  definitized  or  modified  with  the  accumulation  of 
practical  experience.  However,  one  should  recognize  that  the  process 
of  accumulation  of  this  experience  is  extremely  slow.  We  have  a 
certain  number  of  successfully  operating  experimental  systems,  but  as 
yet  not  regularly  fmctloning  big  information-retrieval  services. 

And  meanwhile,  only  in  the  process  of  industrial  exploitation  is  it 
possible  to  organize  comparative  investigations  and  to  work  out 
optimum  criteria. 

Let  us  try,  however^  to  formulate  certain  general  positions, 
proceeding  from  existing  domestic  and  foreign  experience. 

For  effective  exploitation  of  IPS  they  must  be  used  for 
simultaneous  solution  of  two  problems:  1)  for  selective  (address) 
announcement  of  consumers  on  new  entries  on  the  assigned  subject; 

2)  for  retrospective  retrieval  with  respect  to  inquiries.  It  is 
necessary  to  design  systems  which  could  provide  the  serviced  circle 
of  consumers  with  all  forms  of  information  necessary  to  them.  For 
this  purpose  automated  IPS  should,  besides  solving  the  above-indicated 
problems  also  be  widely  used  for  composition  of  various  types  of 
signal  bulletins  or  subject  indicatprs  (in  particular,  of  the 
permutation  type).  Automated  IPS,  realized  on  EVTsM,  must  also 
produce  punched  card  variants  of  small  retrieval  systems  useful  for 
reproduction  and  use  at  places  with  application  of  simple  means  of 
mechanization. 

In  the  process  of  realization  of  large-scale  branch  systems  and 
information  services  an  important  role  must  be  played  by  the 
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ministries  and  branch  information  centers  subordinate  to  them. 

Disposing  big  material  resources,  ministries  can  make  a  decisive 
step  for  transition  from  experimental  and  test  systems  to  large-scale 
operational  services.  All  branch  services  must  be  intelligently 
combined  v.'ith  the  centralized  system  of  processing  scientific  and 
tec'nnical  literature  in  VltnTI  and  scoop  from  VIr.'ITI  materials  for 
filling  branch  IPS  in  a  form  convenient  for  this.  It  is  clear  that 
Vli'ITI,  as  head  institute,  is  ready  actively  to  participate  in  the 
process  of  designing  branch . services  and  must  generalize  and  spread 
experience  accumulated  during  designing  and  exploitation. 

\-;e  must  always  remember  the  indication  of  Comrade  A,  N.  Kosygin, 
Chairman  of  the  Council  of  Ministers  of  the  USSR,  made  by  him  in  a 
speech  at  the  XXIII  Congress  of  the  CPSU,  about  the  need  to  create  a 
highly  effective  state  harmonious  and  reliable  system  of  scientific 
information.  An  interconnected  system  of  branch  services,  servicing 
in  the  beginning  narrcv?  and  then  wider  and  wider  fields  of  science 
and  technology,  gradually  has  to  satisfy  all  interests  of  scientific 
and  engineering-technical  workers.  Complexes  of  macnines  supplied 
with  a  branched  netv.'ork  of  lead-in  ( reading)  and  lead-out  devices 
placed  directly  in  places  of  generation  and  consumption  cf  Information, 
in  scientific-research  and  research-design  establishments,  and  united 
by  means  of  communication  with  central  devices  v.'il.l  fulfill  more  and 
more  diverse  forms  cf  data  processing.  Stoi-ing  in  their  memiOry  units 
a  v.'hole  mass  of  iTitroduced  information,  they  v.-ill  deliver  it  to 
consumers  in  accordance  v/ith  thematic  requisitions  formulated  by  them, 
v.'hlch  will  be  definitized  as  a  result  of  constant  feedtacns. 

Information  should  be  delivered  in  the  form  of  compressed  summiaries, 
but  on  demand  of  the  consumer  and  in  the  form  of  detailed  abstracts 
or  detailed  factographic  references.  Itirthermore ,  periodically  (in 
accordance  with  thematic  profile)  or  on  dem^ands,  sci.entists,  engineers 
and  leaders  have  to  obtain  thematic  surveys  or  specialized  indicators, 
and  products  of  logical  and  statistical  processing  of  accumulated 
inf orination  more  complicated  in  perspective  up  to  forecasts  of  facts 
or  hypouhosos . 


2? 


Results  obtained  up  to  now,  familiarization  with  many  of  vfhich 
awaits  us  at  sessions  cf  sections,  open  fully  real  prospects  of 
aciiievement  of  outlined  targets. 
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PREPARATION  CF  SNGIIJEERINO  ART)  SCIENTIFIC  CADRES  ViTTK 
RESPECT  TO  ITCKAKIZATION  AiTD  AUTOVATIOK  OF 
il'TORI.IATIOK  V.’ORKS 


Doctor  of  Technical  Sciences  Prof.  I.  I.  Petrov  and 

I.  B.  Pochkay 

The  most  important  condition  necessary  for  successful  development 
of  scientific  end  technical  information  in  the  country  is  automation 
of  various  information  processes  based  on  computer  technology  and 
other  contemporary  means  of  automation.  In  connection  v.'ith  this 
questions  of  educating  specialists  in  mechanization  and  automatic 
data  processing  have  become  very  important.  Questions  of  training 
cadres  were  given  especially  great  attention  at  the  XXIII  Congress  of 
the  CPSU.  In  the  current  report  of  the  Central  Committee  of  the  CPSU 
it  is  stressed  that  these  q'.iestions  have  to  be  advanced  to  the  level 
of  general  political  problem.s  of  the  party  and  state. 

Up  to  1955  there  was  in  general  no  preparation  of  specialists  of 
such  a  profile  in  the  USSR,  and  by  this  time  about  100,000  people 
have  worked  in  the  system  of  scientific  and  technica]  inf crmat.von. 

It  is  natural  that  unproductive  "manual"  methods  have  predomdnated 
in  Information  services  and  that  processes  of  information  service 
were  mechanised  and  automated  very  slowly.  Taking  all  these 
circumstances  into  consideration,  the  Ministry  of  higher  and  special 
secondary  education  of  the  USSR  decided  in  ].9'55  to  create  in  the 
higher  school  system  specialty  r.'o.  OcAO  —  ’'Autom;ation  and  niechanization 
of  processes  of  processing  and  delivery  of  inforrriation.  "  This 
specialty  is  offered  in  four  higher  educational  institutions  of  tne 
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country;  the  Kuybyshev  Folytechnical  Institute,  the  Tomsk  Institute 
of  Radio  Electronics,  the  Sevastopol'  Instrument -making  Institute, 
and  the  Tallin  Folytechnical  Institute. 

In  the  first  course  of  all  these  institutes  100  people  are 
taught,  ai'.d  in  the  second  course  75  people.  Certainly,  this  is  small. 
In  order  to  expand  preparation  of  engineers  for  the  information 
organs  of  the  country,  it  is  necessary  to  ask  the  Ministry  of  higher 
and  special  secondary'  education  of  USSR  to  offer  in  the  near  future 
specialty  'To.  Qb-0  in  at  least  o  higher  educational  Institutions  of 
country.  Including  the  higher  educational  institutions  of  Moscow, 
Leningrad,  hiev,  Sverdlovsk,  and  in  other  cities,  where  there  is 
specially  sensed  a  sharp  need  for  specialists  in  automatic  data 
processing  and  where  there  are  scientific  and  pedagogical  cadres  in 
the  field  of  automatics,  computer  technology,  and  technical 
cybernetics.  The  solution  of  this  problem  is  of  interest  not  only 
to  the  information  organs  of  the  country  but  also  to  the  Ministry  of 
Instrument-making,  Means  of  Automation,  and  control  systemis  of  the 
USSR,  v/hich  is  assigned  developrrient  and  production  of  special 
techni.cal  means  and  systems  for  processing,  storage,  and  retrieval  of 
scientific  and  technical  information;  copying-duplicating  equipment; 
means  of  microfilming;  ty'pssetting-typewriters ;  computers;  and  library- 
equipment.  Here  it  should  be  especially  stressed  that  the  specificity- 
of  specialty  IJo.  06'-t0,  the  basic  disciplines  of  which  are  based  on 
computer  technology,  automatics,  electronics,  conmiunication 
engineering,  etc.,  requires  for  organisation  of  laboi-atories  for  ti;is 
specialty  scarce  and  complicated  equipment.  Therefore,  with  the 
offering  of  this  specialty  in  higher  educational  institutions  it  is 
necessary  to  provide  their  corresponding  material  base,  allov;ing  for 
this  sufficient  means  a..d  funds,  including  the  purchase  of  imported 
equipment . 


A.  very  important  problem  also  requiring  im-medlate  solution  is 
the  improvement  of  the  curriculum:  cf  specialty  Ko.  OcI  O,  and  prcgrarr.c 
of  discipline  entering  it.  Many-  of  the  tenets  of  this  plan  are  due 
to  novelty,  and  ti.e  experience  of  its  compilers  turned  out  to  be 
i  n  s  u  f  f  1  c  i  e 1 1  y  fo\i  n  del. 


At  present  the  Scientific  and  Methodical  Council  on  Automation 
of  Industrial  Processes  of  the  Ministry  of  Higher  and  Special 
Secondary  Education  of  the  USSR  recommended  proceeding  from  the 
followii.g  general  principles  in  developing  new  curricula  on  automation^ 

1.  Disciplines  of  physicomathematlcal  and  engineering  cycles 
must  not  contain  obsolete  information,  methods  of  calculations  and 
investigations  historically  composed,  but  having  lost  practical  value, 
and  material  duplicating  other  disciplines.  Programs  of  these 
disciplines  must  be  based  on  the  attained  level  of  natural  (mathematics, 
physics)  and  applied  technical  sciences. 

2.  A  mathematics  course  must  reflect  the  bases  of  mathematical 
logic,  the  principles  of  calculus  of  variation,  probability  and 
information  theories,  and  other  divisions  needed  to  increase  the 
mathematical  level  of  the  specialist  in  automation.  These  divisions 
must  not  be  introduced  into  the  course  as  simple  additions  to  the 
existing  complex  of  mathematical  questions,  but  must  be  an  organic 
part  of  the  whole  course  of  higher  mathematics.  It  is  especially 
Important  that  the  study  of  higher  mathematics  be  conducted  on  the 
basis  of  application  of  computer  tecimology.  In  general,  provision 
should  be  made  for  using  'computer  technology  in  all  disciplines  of 
both  physicomathematlcal  and  engineering  cycles  and  the  special 
cycle. 


5.  The  contents  of  disciplines  in  theoretical  mechanics, 
strength  of  m.aterials,  theory  of  m!achlnes  and  mechanisms,  descriptive 
geometry  and  drawing,  and  other  general-engineering  disciplines  not 
directly  related  to  specialties  in  automation  should  be  radically 
examined.  It  is  necessary  somewhat  to  reduce  the  nomenclature  of 
these  disciplines,  decrease  the  volume  of  certain  of  them,  and 
thoroughly  examine  their  contents. 

A.  It  is  necessary  to  strengthen  in  all  possible  v.'ays  the  general 
and  special  education  of  future  automation  engineers  in  electrical¬ 
engineering  disciplines,  electronics,  and  the  general  theory  of 
automatic  control. 
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5.  11  is  desirable  in  forming  the  nomenclature  of  special 

disciplines  not  to  crush  it  in  a  whole  series  of  small  courses,  tut  to 
provide  in  the  curriculum  for  fundamental  st\idy  of  enlarged  special 
disciplines  embracing  the  basic  questions  of  the  speci.alty.  Such  an 
approach  v;ill  exclude  duplicating  of  materials  in  programs  of  separate 
disciplines  and  Increase  the  quality  of  preparation  of  automation 
specialists  of  a  ’wide  profile.  It  is  necessary  to  provide  for  a  cer¬ 
tain  sequence  in  the  study  of  special  disciplines  so  that  the  study  of 
the  basic  special  disciplines  precede  the  study  of  narrov;er  special 
disciplines. 

It  is  necessary  to  provide  for  a  certain  sequence  in  the  study  of 
special  disciplines  so  that  the  study  of  the  basic  special  disciplines 
precede  the  study  of  narrovjer  special  disciplines. 

5.  A  very  important  problem  in  the  examining  of  the  contents 
of  disciplines  of  automation  curriculums  is  elimination  of  scholasti¬ 
cism  in  the  study  of  these  disciplines.  This  scholasticism  is  caused 
by  the  tendency  to  describe  any  phenomenori  or  process  v;ith  only  a 
matnematical  formula  without  any  explanation  cf  the  essence  of  the 
phenomenon  or  process.  Departure  from,  pliysics  permits  the  student 
to  start  to  perceive  the  physical  process  through  the  prism  of 
mathematical  expression,  not  penetrating  into  the  essence  cf  the 
phenomenon,  and  to  become  helpless  if  this  process  is  modified . 


unfortunately,  in  the  curriculum  of  specialty  No.  06-0  affirmed 
by  the  Ministry  of  Higher  and  Special  Secondary  Education  of  the  USSR, 
t.ne  expounded  principles  were  not  adequately  taken  into  consideration, 
which  made  the  plan  far  from  perfect. 


Thus,  the  plan  contains  the  discipline  ''i-iydraulics  and  hydra'.;lic 
luachines"  (125  ho'urs),  vfhich  is  not  strictly  necessary'  for  the  profile 
of  the  specialist  being  ed'ucated  and  its  exclusion  from  it  is  not 
detrimental.  There  is  insuf f iciently^  founded  remo'/al  from  the 
curriculu::;  of  the  courses  "strength  of  materials,"  "Theory^  of  machinec 
and  meclianisms"  "Machine  parts"  and  leaving  in  it  thie  coui'se 
"Theoreuical  mechanics.”  Here,  as  is  done  in  the  curriculum  of 
specialty  i.'o,  O'jOC  —  "Automatics  and  telemechanics"  and  in  cui'ric.ulunm; 
of  oti.er  automation  specialties,  it  would  be  more  expedient  to  coriblrie 
all  these  disciplines  into  a  single  co'arse,  "i-'.echan:  cs , "  and  mainly 


stress  questions  of  dynamics,  the  theory  of  elasticity  and  the  theory 
of  oscillations  in  it. 

No  provision  is  made  in  the  plan  for  the  study  of  one  of  the 
most  important  basic  disciplines  for  automation  engineers,  namely; 
''Theoretical  bases  of  electrical  engineering,"  containing  expanded 
theory  of  electric  circuits  at  the  expense  of  a  certain  reduction 
in  field  theory.  Instead  of  it  into  the  plan  there  is  introduced 
a  course  in  "General  electrical  engineering,"  which  in  no  case  can 
be  considered  founded.  This  error  must  be  corrected  and  a  large 
number  of  training  hours  must  be  assigned  to  the  study  "Theoretical 
bases  of  electrical  engineering," 

Absent  from  the  curriculum  are  disciplines  important  for  future 
automation  engineers,  such  as  the  "Theory  of  Automatic  Control  and 
Checking"  and  "Mathematical.  Bases  of  Cybernetics,"  which  leaves 
serious  gaps  in  tlie  plan  of  specialty  No.  0640. 

Special  disciplines  are  in  especially  bad  shape.  They  are 
excessively  crushed  and  to  a  considerable  extent  duplicate  one  another 
therefore,  the  composition  of  programs  in  these  disciplines  is  very 
difficult.  Examples  of  such  disciplines  are  "Tech.nologi cal  processes, 
machines,  and  apparatuses  of  scientific  and  technical  information" 

(2^7  hours),  "Means  of  reproduction  cf  scientific  information"  (140 
hours),  "specialized  additional  units  and  devices  of  data  processiiig 
computers  ^lU“  liours^' ,  c^iysieinti  on  punoi'iing  techaiology  and 

electronic  computers"  (85  hours),  "Construction  and  exploitation  of 
punch-out  computers  (50  hours)  and  several  others. 

In  order  to  correct  these  deficiencies  it  is  apparently  necessai-y 
to  enlarge  special  disciplines  and  provide  for  the  study  in  them  of 
such  questions  as  information-retrieval  systems,  automation  of 
technological  processes,  and  others.  The  sequence  of  study  of 
special  disciplines  is  not  maintained  in  the  curriculum.  An  example 
is  the  course  "Bases  of  scientific  and  technical  information,"  the 
study  of  which  starts  only  from  the  Sth  semester,  and  highly 
specialized  disciplines  start  to  be  studied  from  the  6th  semester, 
that  is,  they  precede  the  basic  special  discipline.  This  is  an 
essential  deficiency  of  the  curriculum,  and  it  must  be  corrected. 
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Ti'.'up,  tile  curri CMlum  of  specialty  No,  06'40  now  in  effect  is 
itnpei’fect  and  needs  essential  corrections.  Inasmuch  as  students 
learn  specialty  I'o,  OtaO  only  in  the  second  course,  there  is  time  for 
tl'.eso  corrections.  It  ns  necessary  to  ask  the  Ministry  of  Higher 
a!id  Special  ."■econdary  Education  of  the  USSR  to  examine  the  curriculum 
of  specialty  Mo.  O-jkO  and  cori-ect  it  as  necessary. 

Let  us  now  turn  to  the  question  of  preparation  of  scientific 
cadres  for  the  information  services  of  the  country. 

'n.  presenii  scientific  cadres  are  taught  automatic  data  processing 
in  the  USSR  only  at  Vli.'lTl.  In  1959  it  began  to  offer  post  graduate 
v/ork  in  tlie  three  following  specialties;  "Scientific  and  technical 
information,"  "Computer  technology"  and  "Computer  ir;athematics . "  All 
ti'iese  specialties  are  being  studied  by  62  graduate  students,  including 
%  people  studying  "Scientific  and  Technical  Information,"  In  the 
past  6  ye:ars  19  people  have  completed  post  graduate  work,  3  of  them 
in  19c6.  This  is  very  sm.all.  Apparently,  in  the  near  future  it 
will  be  necessary  to  offer  specialty  Mo,  0o90  for  preparation  of 
engineers  (whic'i  was  n.entioned  above)  not  only  in  big  higher  educa¬ 
tional  institutions  of  the  country  but  simultaneously  to  organize 
preparation  of  graduate  students  in  automatic  data  processing  in 
l!;ese  higher  educational  institutions. 

A  large  potential  reserve  for  preparation  or  science  candidates 
in  ti'e  field  of  ini'vcrmation  is  the  so-called  "competitors,"  c 
nurriber  of  leading  specialists  of  information  services.  Thus,  in 
1965  out  of  the  number  of  VIHTTI  workers  alone  and  especially  from 
its  scientific-research  subdivj.sions,  more  than  25  engineers  started 
to  vjork  on  dissertations,  usir.g  VliJITT  sc: entif ic-researcl‘; 
laboratories  as  an  experimental  base, 

Tlie  educating  of  scientific  cadres  in  the  field  of  autoniatior. 
of  information  processes  is  considerably  deterred  by  t!:e  informing 
of  specialists  about  the  scientific  set  of  problems  of  this  new 
branch  of  knowledge.  This  set  of  probleir.s  is  ver;;  interesting  and 
man;, '-sided .  It  is  formed  at  the  juricticn  of  niany  sciences  atjti  uses 
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i'Ot!'i  achieveT.ei'its  i;'i  tl'.e  field  of  semiotics,  at.d  ac!';ieveme;.ts  li;  tiie 
field  of  auton;8tics  and  telemech.anics ,  computing  and  electronic 
tecnriolony,  cybernetics,  arid  other  sciences.  It  was  very  timely 
to  cor;pose  and  issue  a  special  scientific  and  methodical  aid  v;it;.  an 
accourit  in  it  of  the  basic  problems  of  scieritific  ir.form;ati  on  and 
problems  of  automation  of  information  processes  and  v^idely  to  diffuse 
it  among  the  specialists  of  information  services  of  the  country. 

Considerable  difficulties  arise  in  the  selection  of  scientific 
leaders  of  graduate  students  and  corripetiters .  It  is  nectssary  more 
v/idely  to  attract  to  such  leadership  not  only  scientists  working  in 
organs  of  information  but  also  scientists  from  various  kinds  of 
scieritific  research  institutes  and  higiier  educational  institutions, 
the  thematic  directivity  of  vjhich  is  close  to  the  protlerris  of 
automatic  data  processing  (institutes  of  cybernetics,  automatics, 
and  telemechanics,  computer  technology,  etc.,  and  also  the 
corresponding  departments  of  higher  educational  institutions). 

At  present  problems  of  increasing  scientific  and  engineering 
qualification  of  cadres  occupied  in  the  field  of  development,  mastery 
and  exploitation  of  information-retrieval  systems  arid  mechanication 
and  automation  of  processes  cf  processing  of  scientific  and  technical 
information  are  becoming  very  important.  It  is  itecessary  more  v.'idely 
to  practice  the  organization  of  constantly  operational  and  short-term 
courses,  scientific  and  engineering  cadres  and  also  leading  cadres 
of  institi;tes  of  inf ortnat ion,  using  the  experience  of  the  VII.'TIu.  At 
those  higher  educational  institutions  of  country  v/here  specialty 
;.'o,  Ob^O  is  offered,  it  is  expedient  to  create  courses  to  increase 
the  qualification  of  specialists  v/orking  iii  the  field  of  automation 
and  mechanization  of  inf orrr.at ion  processes.  Ic  is  also  necessary  tc 
more  widely  attract  scientist-candidates  and  doctors  of  sciences 
woi'king  or  having  education  in  the  field  of  scientific  inform.ation 
to  the  reading  of  lectures  and  teaching  in  higher  educational 
institutions,  in  various  courses,  and  at  various  seminars. 


■J'Hs  LOGiC  Or  uESCr.T  r;  OP.  F.ETF.TEVAL  SY3TE”c 
'v,  S.  Chernyavskiy 

During  construction  of  retrieval  systems,  including  so-called 
descriptor  retrieval  systems,  it  is  alv.'ays  necessary  to  make  a  large 
numpf^r  of  different  assumptions,  v.’hich  usually  are  not  clearly 
formulated.  These  assumptions,  v.-'r.ich  in  the.  most  essential  manner 
determine  toth  tne  structure  and  the  propei-ties  of  created  systems, 
cannot  be  derived  from  any  theories  developed  up  to  the  present  tim.e 
and  at  the  same  time,  as  far  as  it  is  now  possible  to  judge,  cannot 
be  confirm.ed  or  refuted  by  no  m:atter  what  experiment  different  from 
direct  experimientirg  v.'it)'.  retrieval  systems  built  on  their  basis, 
'Therefore,  the  most  natural  evaluation  of  retrieval  systems  is  one 
v.'hich  in  evident  form  operates  v.'ith  assumptions  on  v/hich  they  are 
based  and  rests  on  more  or  less  considerable  experience  of  their 
exploitation.  Tn  this  article  such  analysis  is  conducted  for  two 
s.ystemiE  of  the  "Piusto-i.'epusto"  class  —  the  systems  "P'usto-iiepusto" -4 , " 
developed  by  Berr.shtein  and  "Tnusto-Pepusto-P, "  developed  by  Lakh'.’ti, 

For  sources  of  that  group  of  retrieval  systemis  to  which  there 
belong,  in  particular,  retrieval  systems  of  the  "Pusto-Depusto"  class, 
there  lies  an  idea  of  fundamental  importance  for  the  first  time 
expressed  and  re<alized,  as  far  as  can  be  judged,  by  Kortimer  Taub  in 
ills  "Uniterm"  systera.  This  idea  consists  in  the  fact  that  in  natural 
languaege  it  is  possible  to  separate  certain  "significant"  v.'ords  that 
v.'itii  a  completeness,  sufficient  for  the  purposes  of  inform;ation 
retrieval,  the  contents  of  dGcumerU  s  and  inquiries  v;ill  be  transmitted 


by  a  disordered  set  of  " signi^'i cor.t"  v;ords  er.iering  'ii  otiier 

v.’oi'ds,  the  Taub  idea  consists  in  the  fact  that  for  retrieval  purposes 
it  Is  sufficient  to  consider  that  part  o;  fne  convents  of  docuur.ents 
=  .'’;d  uuj’ui  ■''ies  ,  v;hlch  is  ti'ansciittcd  by  ti;eir  di  ct  1 'Ti;-.  I'y  composition. 


Both  this  idea  itself  and  various  modifications  ov  it  have 
provoked  and  are  till  now  provoking  numerous  objections.  These 
objections  basically  boll  dov.’n  to  affirmation  that  retrieval  systems 
not  taking  textual  relationships  betv.'een  v.-ords  into  ecceunt  cannot 
be  effective.  As  an  argumient  there  are  usually  given  various  examples, 
cuch.,  let  us  say,  as  "influence  of  dyes  on  bacteria"  and  "influence 
of  bacteria  on  dyes,"  These  examples  have  to  shov;  t’lat  d'' (.'Vd  onary 
corTiposition  of  sentences  can  be  the  same,  and  at  the  same  time,  if 
one  of  them  is  considered  an  inquiry  and  the  other,  let  us  say,  as 
the  title  of  a  document,  tbicn  the  document  should  hardly  be  Issued 
on  demand. 


In  spite  of  the  apparent  corivincinrr.ess  of  such  objections,  at 
present  the  successfully  exploited  retrieval  systems  of  the  "■■rlterm" 
•'ype,  are  well  known,  and  this  simple  fact  Indicates  that  the  matter 
is  by  no  means  examples  contradicting  the  Taub  idea  or  any  other  idea, 
tut  in  whether  such  examples  are  encountered  in  cor.crete  conditions 
of  functioning  of  the  retrieval  system  and  in  a  quantity  noticeablj'’ 
lowering  its  effectiveness.  Thus,  one  may  assume  that  in  spite  of 
the  existence  of  contradicting  examples  and  possible  objections, 
the  experiment  confirmed  the  correctness  of  the  Taut  idea,  it  goes 
without  saying  only  for  those  concrete  conditions  in  v/hlch  "L’niterm" 
type  systems  are  exploited,  so  that  there  are  no  bases  to  ascribe 
to  this  idea  great  universality.  Vfnat  has  been  said  well  illustrates 
that  very  Important  circumstance  that  now  in  the  discussed  set  of 
problems  speculative  reasonings  can  be  only  the  initial  point  of 
investigation,  but  not  its  replacement;  reliable  conclusions  can  be 
dravm  only  on  the  basis  of  experiment. 

In  accordance  with  the  simplest  treatment  of  the  Taub  idea,  if 
it  is  taken  literally,  the  document  would  have  to  be  Issued  in  ansv/er 
to  those  and  only  those  inquiries,  the  dictionary  composition  of 
v/hich  coincides  with  its  dictionary  cciiipcsltion.  In  such  a  form  the 
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Taub  idea  was  not  grasped  by  anyone  and  was  not  realized,  at  least, 

:in  systems  which  are  appropriately  called  systems.  From  the  very 
beginning  this  idea  was  transformed  in  that  direction,  which  for 
delivery  of  a  certain  document  is  sufficient  so  that  it  included  all 
terms  from  v.'h.icr.  the  inquiry  is  built. 

This  additional  assumptioi:i  is  not  evident  and  cannot  be  obtained 
deductively  as  a  result  of  generally  significant  evident  positions. 
Voreover,  it  is  easy  to  thinh  of  a  situation  in  vjhich  ic  will  be 
incorrect,  since  relevance  or  irrelevance  is  not  an  immanent  property 
of  the  document -inquiry  pair  and  car.  essentially  depend  on  the 
conditions  of  functioning  of  the  system,  Sucii  a  situation  can  be 
caused,  in  particular,  by  the  use  as  information  objects  of  multltherrie 
documents  and  documents  large  in  volume.  In  these  conditions  it  car. 
happen  that  a  small  document  wholly  dedicated  to  the  subject  of  the 
inquiry  is  relevant,  while  a  lar.ge  volum.e  concerning  the  sui^lect  of 
the  inquiry  is  not  relevant  and  this  i.n  spite  of  the  fact  that  in  a 
large  document  to  the  subject  of  the  inquiry  there  will  be  assigned 
as  much  place  as  in  a  small  one.  The  possibility  of  such  a  situation 
is  anticipated  in  particular  by  harvard  U.niversity  researchers  (the 
United  States),  when  they  offer  for  calculation  of  the  relevance  of 
a  document  a  formula  according  to  which  relevance  turns  out  to  be 
inversely  proportional  to  the  volume  of  the  retrieval  pattern  of  the 
document . 

This  assumption  can  be  onposed  on  the  same  grounds  as  the  Taub 
idea,  namely,  it  is  possible  to  give  as  a  contradicting  example  some 
artificial  document  or  one  encountered  in  practice,  including  the 
whole  dictionary  compositior;  of  an  inquiry  the  relevance  of  v;hich 
nonetheless  is  more  than  doubtful.  An  example  of  such  an  inquiry 
Is  "voltage  of  generators  utilized  on  submarines"  and  the  title  of 
the  document  "repair  of  high-voltage  generators  utilized  on 
submarines,"  And  nevertheless,  in  spite  of  possible  objections  and 
the  existence  of  examples  contradicting  the  discussed  assumption,  it 
ic  confirrried  by  successful  functioning  of  systems  of  the  "Uniterm" 
type;  it  is  confirm.ed,  of  course,  only  for  those  concrete  conditions 
in  which  these  systems  function. 


Sy^teinf.  of  the  "I'niteriii"  typo  are  based  or.  a  tlurd  ass'.iir.pi  ioi;  — 
viamely,  cn  the  assuniption  that  entry  of  all  tiie  terms  of  tiie  inquiry 
into  the  dictionary  composition  of  the  doe.uinent  is  not  only  sufficioiit 
.  nt  also  ’lecessary  for  their  relevance.  Lihe  th.e  other  tv;o  assumptions 
a], ready  exairdned  by  us,  tlic  t'cilrd  aEs\!mpiion  is  not  evident  and  it.  is 
easy  to  come  up  with  examples  contradicting  it.  Vhese  examples  break 
down  into  tv;o  groups  essentially  differing  from  each  other. 

The  first  group  can  be.  represented  by  an  Inquiry  in  v/'nich  there 
is  an  adjective  or  another  v;ord  limiting  Its  sv'bject.  Let  us  assume 
that,  for  example,  there  is  a  text  relevant  to  the  inquiry  "production 
of  transformers.”  Then  it  can  happen  that  it  is  also  relevant  for 
^  he  inquiry  "production  of  large  transformers"  in  spite  of  t'ie  fact 
that  the  word  "large"  does  not  appear  in  Its  dictionary-  composition. 
Practice,  however,  shows  that  such  examples,  though  they^  be  comple'^’ely^ 
real,  do  mt  noticeably .  lower  the  efficiency  retrieval  systems  of  the 
"Uniterm"  type  and  therefore  cannot  be  the  cause  of  transition  to 
0f  another  type. 

It  is  an  entirely  different  story  v;ith  examples  of  the  second 
,:roup  since  difficulties  connected  with  them  car.  no  longer  be 
disregarded.  Let  us  consider  as  an  illustration  the  inoyuiry 
"exploitation  of  high-voltage  equipment"  and  a  docuir.ent  under  the 
heading  of  "repair  of  small-oil  circuit-breakers."  I.'ot  one  term  of 
the  inaulry  enters  the  title  of  the  document  and  nevertheless,  even 
without  turning  to  the  text  of  the  document.  It  is  possible  to  say 
•with  confidence  that  it  is  relevant  to  the  inquiry  since  questions 
of  repair  pertain  to  exploitation,  and  small-oil  circuit-breakers 
pertain  to  high-voltage  equipment.  Practice  shows  that  such  cases 
cannot  be  disregarded,  since  this  would  lead  to  unacceptable  losses 
of  information.  Therefore,  the  Teub  principles  must  be  essentially 
modified,  which  gradually  goes  beyond  the  limits  of  "Uniterm"  type. 

In  order  to  consider  cases  analogous  to  the  above-mentioned 
exam.ple  there  is  no  need  to  reject  the  basic  idea  according  to  which 
the  contents  of  inquiries  and  documents  is  transmitted  by  their 
dictionary  composition.  This  idea,  however,  must  be  supplemented  by 
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another,  it',  accordance  v/itli  v.'h.icl’;  between  terms  essentlul  for  tran.s- 
itijssion  of  the  contents  of  text!-  tl.ore  can  exist  relationships  whiicbi 
v:e  call  basic  —  relatlo;js]iips  b;,'  '.'iriue  of  wliich  the  imjuiry  and 
dociiment  cait  be  relevat:'..  e^on  v/itl'.out  tlie  entry  of  ti,e  dictionary 
conipositior;  of  the  innr.ir;,'  irlo  ti'c  dictionary  contposition  of  thic 
document , 

.'tt  present  successfully  exploited  systen-s  of  the  "bniterm"  type 
arc  well-known,  and  it  can  appear  that  this  contradicts  the  conclusion 
tliat  one  of  the  assumptions  on  w’nic'.i  these  systems  are  based,  are 
refuted  by  tb.e  practice  of  theii'  exploitation.  This,  however,  is  not 
so. 


The  basic  relationships  can  be  considered  in  tlie  process  of 
functioning!;  of  retrieval  systems  by  various  methods.  First  of  all 
it  is  possible  to  fix  the  necessary  relationships  between  tei'ins, 
let  us  say,  havin.g  assigr.ed  these  relationships  by  list,  —  and  one 
way  or  another  introduce  them  into  the  retrieval  system,  for  example, 
having  assigned  an  algorithm,  of  comparison  of  inquiries  and  documents 
usin.g  these  relaticr.ciilps .  Thus,  for  exam.ple,  v.-ishing  to  look  up 
the  above-mentioned  exam.ple,  it  would  have  been  possible  to  introduce 
into  the  set  of  term.s  the  asymm'-etrical  relationship  of  "subordination" 
and  to  subordinate  the  term  "repair"  to  the  term  "exploitation,"  the 
tern;  "circuit-breaker"  to  the  term  "equipment,"  and  the  term  "small- 
oil"  to  the  term  "high-voltage"  and  to  cay  that  the  docu.~ent  snoulu 
be  issued  in  ans’wer  to  the  iicquiry  only  if  every  term,  of  the  inquiry 
either  enters  the  docum;ent  Itself  or  is  represented  in  it  by  f!  term; 
subordinated  to  it. 


If  the  problem;  is  to  constimict  an  autor;atic  retrieval  system;  or 
at  least  a  system;  which,  would  not  use  the  creative  abilities  of  the 
person  exploiting  it  in  the  process  of  functioning,  then  this  m.ethod 
of  realization  of  the  basic  relationships  is  the  only  method. 

Resorting  to  this  m.ethod,  we  rise  to  a  way  v;hich  essentially  changes 
the  structure  of  the  reor.ganized  exploration  system  and  through  a 
r.um;ter  of  interm.ediate  systems  leads  to  systemis  represented  at  present 
b;,^  "pusto-fepusto"  s;, ’sterns.  If,  however,  it  is  not  necessary  to 
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limit  human  tic^rticipatlon  in  th.e  retrieval  process, 
rcilationsliips  can  le  realized  eveii  without  char.p;e:-,  iii  the  I'ctricval 
s'utem,  v.'ltn  on].,v  one  complication  oi  tlic  proced-na'  of  iia  '.ise,  ii;c 
additional  hurden  connected  v/itti  realizniion  of  il;e  lacic  rclai  i  or.ci.i  ps 
ie  1-orne  r*ot  ly  the  retrieval  system  hut  hy  tlic  person  exploi t,i it. 

One  of  the  variants  of  such  a  complication  coi'.sists  iri  tlic  fact 
that  in  ti.e  retrieval  pattern  of  the  doevanent  thiorc  arc  inscribed  r.o1 
Oily  terms  present  in  it  but  also  tb.ose  tlie  pi’esence  of  wliich  in 
i’nquiry  v/ould  require,  in  the  opinion  of  tiic  indexer,  delivex'y  of  tb.c 
ir.dcxed  document.  'Vmis.  for  example,  in  the  example  examined  by  us, 
ir.  the  retrieval  pattern  of  t,he  document  there  should  be  incl\'idcd  not 
oi'.ly  the  terms  "repair,'  "small-oil,"  and  "svjitcb;"  but  also  the  terms 
"exploitation,"  "lilgh-voltage, "  "eq’aipmcnt"  and,  possibly,  still 
others  at  the  discretion  of  the  indexer. 

Another,  in  many  respects  stronger,  variant  of  complication  of 
the  diagram  of  exploitation  of  the  retrieval  system  —  of  a  coisplicc.ti''’' 
also  having  the  purpose  of  realization  of  basic  ratios,  consists  in 
th.e  fact  that  instead  of  one  retrieval  on  one  inquiry  ti;crc  a"o 
conducted  a  number  of  retrievals  or.  several  inquiries,  vjhic'.i  arc 
modifications  of  the  Initial  inquiry.  Thus,  in  the  example  considered 
by  us  the  inquiry  "exploitation  of  high-voltage  equipment"  could  have 
been  possible  at  f'..e  discretion  of  the  inquirer  supplemented  by  such 
of  its  mod"' fi cations  as,  let  us  say,  "repair  of  high-volta.ge  eq'uip- 
ment,"  "exploitation  of  small-oil  circuit-breakers,''  "repair  of 
sDiall-oil  circuit-breakers,"  etc.,  and  retrieval  could  have  been 
carried  out  according  to  each  of  these  i'.";qulrlco . 

Thus,  using  the  "vniterir;"  system,  which  issues  a  document  only 
if  its  retrieval  pattern  contains  all  term:s  entering  the  retrie'val 
pattern  of  the  inquiry,  we  can  by  special  procedures  of  exploitation 
of  t'nis  system  find  in  the  end  those  docume'its  v;'nich  corre.spond  to 
our  inquiry,  not  including  its  dictionary  composition.  In  all  ki'.ov.'n 
cases  of  t'ne  successful  use  of  exploration  of  "Uniterm" -type  systems 
there  are  v^sed  both  above-described  methods  of  their  complicated 
exploitation,  and  the  need  for  such  special  measures  directed  towards 


renioval  of  undeEli'able  conscqxicnccs  of  the  third  aESuraptJ.on  indicates 
that  namely  this  assumption  v.-as  not  Justified. 

As  has  already  been  said,  in  tl'.ose  cases  in  which:  t!\e  basic 
relatioiish i ps  have  \o  be  realised  ;;ot  by  the  retrieval  system  itself 
but  by  a  sclieme  of  its  'use,  th.c  developer  of  tlic  retrieval  system 
cannot  n^ake  tlie  basic  relationships  the  object  of  special  development, 
tr.ereby  siiiftinr,  th,c  Iv.rden.  and  respoiisibility  to  indexers  and 
inquirers.  Rut  if  for  considcrati or.s ,  such  as  the  requirement  of 
complete  a'.itomati  c:  ty  tl-.c  rcalir.alion  of  basic  relationships  is 
assn.mcd  t'-y  li;C  retrieval  system,  special  development  of  a  system  of 
basic  relati.onships  t'urr.s  O'.it  to  be  iiievitable. 

D\irinc.  develop;, ^ent  of  t!'.e  basic  relationships  it  is  possible 
to  lean  on  th.e  most  diverse  assumptions,  wtiich  can  be  confirmed  or 
refuted  only  by  experiment.  Therefore,  it  Is  natural  to  start  from, 
attem.pts  to  solve  th.e  problem  by  the  simplest  means.  Durinij  develop- 
mient  of  the  '  P'asto-Rcpusto-J "  as  such  means  there  v;ere  used,  first, 
a  reurioval  languace*  analogous  to  th.e  lan.gauges  of  "Uniterm."-type 
systems  the  v/ords  of  vihich,  —  descriptors  —  vfere  with  rai't  exceptions 
translations  of  natural  langua.je  terms,  and,  secondly,  a  transitive, 
asymmetrical  and  U'lreflexi vo  predicate  partially  regula+ing  the  set 
of  descriptors.  Th;e  system  of  basic  relationships  vjas  constructed 
as  a  set  of  sentences  of  type  [F(d,  d')]  (n(B,  f, ’))  assigned  by  list, 
v.'herc  d  and  d'  are  descriptovs  of  thie  language,  and  P  is  a  regulating 
predicate . 

'fi-'.us,  one  of  the  ass’.;r,ptions  in  vihich  th.e  logic  of  "Pusto- 
i:epusto"-class  is  based,  consisted  in  the  possibility  of  reaching 
t’i';e  necessary  result  Vvith  help  of  paired  ratios  of  the  form.  P(d,  d') 
not  depending  on  context  vfhere  P  is  a  predicate  of  partial  order.  Rothi 
the  operati  o’.-.al  experience  of  th;e  "P'usto-riepasto-h system  and  tl'.e 
first  series  of  experimental  retrievals  via  the  "Pusto-ilep'usto-P" 
system  do  not  yet  give  sufficient  bases  for  refutal  of  this  assumption, 
alth.ougi;  iu  is  already  clear  thaL  cci-Lain  advantages  deserving 
considera  ■:  i  ou  cov.ld  be  .given  b;.-  a  system  of  relationships  depeuiding 
on  the  context  of  thie  dccur.ent  and  inquiry  compared.  Such  dependcr.ee 
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can  be  realized  In  many  ways  —  by  both  direct  and  roundabout  methods 
detailed  discussion  of  which  is  not  within  the  scope  of  the  present 
report.  Let  us  note  only  that  fixing  of  word  combinations  and 
homonyms  introduced  in  insignificant  quantity  into  the  dictionary  of 
"Pusto-Nepusto"  systems  as  Independent  descriptors  permits  making 
the  necessary  basic  relationships  to  a  certain  extent  essentially 
dependent  on  the  context  of  the  documents.  However,  the  number  of 
word  combinations  and  homonyms  introduced  into  the  language  is  small, 
so  that  we  do  not  yet  have  .bases  to  talk  about  systematic  use  of 
connections  depending  on  context. 

Thus,  the  basic  relationships  are  introduced  specially  so  that 
a  document  pithily  corresponding  to  the  inquiry  could  be  issued  also 
when  it  does  not  include  the  whole  dictionary  composition  of  this 
inquiry.  Therefore,  the  simplest  principle  of  setting  basic 
relationships  P(d,  d’)  between  descriptors  d  and  d'  taken  in  this 
order  is  the  following  principle:  let  us  aosumt  that  [D]  (J],)  and  D' 
are  random. documents,  and  let  us  assume  that  the  descriptor  pattern  of 
document  D'  is  obtained  from  the  descriptor  pattern  of  document  D 
by  replacement  of  descriptor  d  with  descriptor  d';  if  document  D'  is 
relevant  to  any  inquiry  relevant  to  docvunent  D,  between  descriptors 
d  and  d'  there  should  be  established  relationship  P(d,  d'). 

It  is  probable  that  certain  pithy  relationships  bet^reen  ideas 
can  be  intimately  connected  with  the  formally  determined  relationship 
P.  Thus,  for  example,  it  is  possible  to  expect  that  generic 
relationships  will  be  Included  in  relationship  P  in  the  sense  that 
every  time  d  is  a  generic  idea  with  respect  to  form  d ' ,  it  will  be 
necessary  to  establish  relationship  P(d,  d')  between  them.  On  this 
count  it  is  possible  to  express  many  assumptions,  which,  however, 
all  need  experimental  check.  It  would  be  interesting,  for  example, 
to  clarify  whether  the  relationship  of  type  to  form  corresponds  with 
that  part  of  relationship  P  vrhich  does  not  need  establishment  of 
dependence  on  context.  However,  it  is  important  to  emphasize  that 
the  connection  of  the  formal  relationship  P  with  pithy  relationships 
is  an  empirical  fact  and  cannot  be  obtained  as  a  result  of  a  deductive 
conclusion.  In  this  connection  it  is  interesting  to  note  that  such 
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an  approach  to  the  establishment  of  basic  relationships  was  clearly 
formulated,  as  far  as  we  know,  only  in  two  cases:  by  the  Nidkhema 
group  in  Great  Britain  —  as  we  learned  of  this  in  1965  during  the 
[FID]  [expansion  unknown]  symposium  in  Moscow  and  during 

construction  of  systems  of  the  "Pusto-Nepusto"  class. 

Appearance  in  the  logic  of  retrieval  systems  of  fixed  basic 
relationships  requires  making  new  decisions  which  need  to  be  checked 
experimentally  and  can  be  initially  based  only  on  a  priori  assrimptions 

First  of  all  one  should  note  that  the  above-formulated  principle 
of  establishing  relationship  P(d,  d*)  does  not  give  us  any  algorithm 
which  would  Indeed  allow  deciding  whether  or  not  reD.ationship  P 
should  be  established  between  random  d  and  d'.  .This  principle  can 
therefore  be  considered  only  heuristic  help  of  our  intuition,  and 
only  exploitation  of  the  retrieval  system  can  show,  how  successfully 
this  principle  can  be  put  into  practice. 

Further,  even  if  it  is  assumed  that  the  principle  of  establishing 
relationship  P  can  be  conducted  in  some  sense  in  series  and 
sufficiently  effectively,  then  during  formulation  of  comparison 
rules  nevertheless  It  is  necessary  to  make  a  whole  row  of  assumptions 
about  this  relationship. 

y 

For  convenience  of  the  following  presentation  we  will  say  that 
descriptor  d  is  subordinate  to  itself  descriptor  d'  or  stands  higher 
than  this  descriptor  if  relationship  P  is  established  between  them. 
Then  assumptions  made  during  the  construction  of  the  "Pusto-Nepusto" 
system  can  be  formulated  in  the  following  way. 

The  first  assumption  is  that  the  relevance  of  the  document  is 
Influenced  not  only  by  replacement  of  one  of  its  descriptors  with 
the  descriptor  directly  below  it,  but  also  by  simultaneous  replacement 
of  an  arbitrary  number  of  descriptors  with  arbitrarily  descriptors 
below  them.  This  assumption  is  not  evident  and  does  not  come  from 
the  principle  of  establishment  of  basic  relationships.  Nonetheless 
one  may  assume  that  the  practice  of  experimental  exploitation  of  the 


" Pusto-Mepusto"  systerr:  confirn'ied  this  assuniption  for  tl':0£e  concrete 
conditions  in  which  the  system  was  tested.  It  remains  unclarifled 
;hat  role  in  confirmation  of  this  assumption  is  playod  'ey  ■':he 
ircv.TiStance  that  almost  half  of  the  descriptors  of  language  v.-ere 
iiOt  connected  with  any  other  descriptors;  average  length  of  circuits 
of  interconnected  descriptors  did  not  exceed  J,  and  the  average 
number  of  descriptors  in  an  inquiry  was  equal  to  ^  (i,e.,  that  during 
solutl.on  of  th.e  problem  of  document,  replaceeient  of  not  more  than  four 
descriptors  was  actual). 

The  second  assumption  used  in  the  formulation  of  rules  of 
comparison  in  the  "Prsto-Nepusto-t"  system,  is  connected  v;tth 
replacement  of  descriptors  with  hig'ner  descriptors.  It  was  assumed 
that  relationship  P  should  to  a  considerable  extent  coincide  v.'ith  the 
relationship  of  the  general  to  the  particular  idea  and  that, 
consequently,  replacing  higher  descriptors,  we  will  probably  make  the 
subject  of  consideration  more  general.  T'nerefore,  in  the  rules  of 
comparison  it  was  anticipated  that  replacemient  of  descriptors  v.-ith 
'r.igher  ones  lowers  relevance  but  does  not  make  jt  equal  to  zero. 

On  the  same  basis  it  v;as,  further  assumed  that  presence  in  the 
document  of  descriptros  above  the  descriptors  of  the  inquiry  somewhat 
lowers  document  relevance  since  it  is  probable  that  the  document  is 
not  about  the  subject  of  interest  to  the  inquirer  but  about  something 
more  general. 

The  last  two  assuiriptions  turned  out  to  be  not  as  well-founded. 
They  were  not  confirmed  by  practice  of  exploitation  of  fne  "Pusto- 
iJepusto-4''  system,  and  this  circumstance  was  one  of  the  causes  of 
transition  to  the  "Pusto-Mepusto-?"  system. 

The  first  of  these  assumptions  lead  to  noticeable  noise.  This 
is  explained  apparently  by  the  fact  that  on  the  one  hand,  relationship 
F  is  connected  with  generic  relationships  not  as  closely  as  it  seemed 
to  be  at  first,  and  on  the  oth.er  by  the  fact  that  transivity  of 
relationship  P  led  to  delivery  in  ansv;er  to  very  concrete  inquiries 
of  a  large  number  of  such  general  documents  that  they  must  have  bee.r; 
of  no  real  value  to  the  inquirer, 
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The  last  of  the  assumptions  now  examined  could  not  lead  to 
serious  troubles,  since  it  influenced  not  the  composition  of  issued 
documents  but  only  their  distribution  with  respect  to  the  echelons 
of  delivery.  I.onetlieless  the  unnaturalness  of  distribution  of 
documents  by  echelons  was  in  certain  cases  so  evident  that  this  had 
to  be  grasped  by  the  inquirer  as  a  deficiency. 

The  following  assumptions  were  made  in  connection  with  what  has 
been  said  on  the  basis  of  the  "Pusto-r:epusto-2"  system. 

First  of  all,  the  ''?usto-'-:epusto-2"  system  rests  on  that 
fundamental  Idea  of  Taub  according  to  which  the  contents  of  the 
document  and  inquiry  is  transmitted  by  their  dictionary  comiposi ';ion 
with  fullness  sufficient  for  the  purpose  of  information  retrieval, 

Piirther,  as  ir.  the  "r'aFco-r;epusto-4''  system  it  v;os  assumed  that 
for  delivery  of  a  certain  document  it  is  necessary  and  sufficient 
that  every  descriptor  of  the  inquiry  be  represented  in  the  retrieval 
pattern  of  the  document  either  by  a  descriptor  equal  to  it  or  by  some 
descriptor  connected  v;lth  it  by  basic  relationships. 

But  in  contrast  to  the  "Piasto-irepusto-^ "  system  it  v/as  now 

necessary  to  reject  the  assum.ption  that  the  necessary  result  can  be 

reached  v/ith  the  help  of  one  transitive  predicate  P,  partially 

regulating  the  set  of  descriptors.  Instead  of  this  the  basis  of  logic 

of  the  "Pusto- Pepustu-S"  v.-as  the  ass'umption  of  the  two  predicates 
1  2 

P  and  P  ,  The  first  of  them  coincides  b\'  and  large  V7ith  predicate 
P  of  the  "pasto-Hepusto-4''  system  and  P^(d,  d')  can  be  as  before  read 
"d'  is  beloiv  d."  The  second  predicate  in  contrast  to  the  first  is 
not  transitive  and  to  a  v;ell-k.nov;n  degree  can  be  grasped  as  a 
predicate  having  a  large  number  of  exceptions  reverse  to  P^:  in  a 
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noticeable  number  of  cases  P  (d,  d')  is  equivalent  to  P  (d,  d'). 

P^(d.,  d')  is  read  "d  is  above  d'."  Thus,  in  the  "Pusto-Nepusto-2" 
system  "d  is  above  d  ' and  "d '  is  below  d"  —  this  is  not  one  and  the 
same,  and,  furthermore,  it  can  happen  that  "d  is  above  d',"  "d'  is 
above  d '  , bu'^  "d  is  riot  above  d'." 


The  basic  relationships  of  the  "R;stc-;.'epusto-2"  syster::  ’.■.-ere 
developed  according  to  the  principle  usual  for  ''Pusto-hepustc"  systems 
v;biich  is  described  above  and  which  was  used  in  its  time  by  yidkhem. 
They  remained  independent  of  context  ."ii-st  as  in  the  "  Pusto-.'iepusto-H '' 
system  aiid  with  the  same  reservations  with  respect  to  v.'crd  ccirbina- 
tions  and  homonyms. 

It  is  necessary  to  note  that  the  generalization  of  basic 
relationships  made  in  the  " Pusto-iiepusto-2"  system  as  compared  to 
the  "Pusto-Nepusto-4 "  system  is  not,  of  course,  the  only  concei vable 
one.  But  it  is  the  simplest  one  which  allows  counting  on  removal 
of  deficiencies  revealed  in  the  logic  of  the  "pjsto-IIepusto-u "  system. 

The  rules  of  comparison  of  the  "Pusto-hepustc-2"  system  also 
rest  on  assumptions  different  from  the  corresponding  assumptions  of 
the  ''Pusto-Nepusto-4"  system.  These  assumptions  can  be  formulated 
in  the  following  way: 

1.  If  an  Inquiry  descriptor  in  a  document  is  replaced  v.'ith  a 
lovier  descriptor,  this  in  no  way  reflects  on  the  relevance  of  une 
document  < 

2.  If  a  document  for  a  certain  inquiry  descriptor  has  not  only 
an  equal  or  lov/er  tut  also  a  higher  one,  ther;  this  in  no  v.’ay  effects 
document  relevance. 

3.  If  in  the  document  for  a  certain  inquiry  descriptor  there 
is  neither  an  equal  nor  a  lower  one,  but  at  least  one  higher  one, 
then  this  lowers  document  relevance,  but  does  not  make  it  equal  to 
zero. 


From  what  has  been  said  it  is  easy  to  see  that  "Pusto-hepusto-u" 
logic  makes  It  possible  to  split  delivery  only  into  2  echelons 
against  the  4  echelons  of  the  "Pusto-r-iepusto-4"  system,  v.'here 
echelons  of  the  "Pusto-Nepusto-2"  system  carucot  be  obtained  by  grouping 
echelons  of  the  "P'usto-NepuEto~4 "  system. 
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:iow  the  "Pasto-i;epusto-2''  system  is  in  experimental  exploitation 
and  at  the  present  it  is  early  to  draw  conclusions  about  results  of 
cond',:cted  reorganization  of  the  logic  of  systems  of  the  "Pusto- 
liepusto"  class.  Nevertheless,  preliminary  data  bear  witness  to  the 
fact  thr-t  delivery  of  tr.e  ne'w  system  is  fuller  than  that  of  the  old 
one  and  contains  noticeably  less  noise.  For  the  present  it  is  diffi¬ 
cult  to  say  hov;  essential  the  loss  of  the  possibility  to  divide 
delivery  into  four  echelons  is:  if  this  proves  to  be  an  essential 
deficiency  of  the  nev;  system,  then  it  will  most  likely  be  essential 
only  in  arrays  of  the  order  of  several  hundred  thousand,  which  we 
have  not  yet  achieved. 


THE  "SSTKA-3"  AUTO'-L^TED  IPS  0;'  THE  "MIi:SK-22"  WITli  TE.E 
USE  OF  THE  SOCKET  ASSOC  I  ATT  •'.T?- ADDRESS  1.T;TH0D  0? 

ORGANIZATION  0?  I’POFAIATTC:.’ 

S.  A.  Gorokhov 

Since  1964  at  [NIIEIR]  (HMM8MP)  [expansion  unknown]  there  have 
teen  developed  several  fundamentally'  different  approaches  to 
construction  (machine  realization)  of  automated  information-retrieval 
systems  ([  IPS]  (MIIC))  of  the  descriptor  type  on  the  ''P.insk-SS"  and  the 
"Mlnsk-2"  [1].  There  were  examined  IPS  using  in  diverse  variants 
the  nodal  associative-address  method  of  organization  of  initial 
information  and  the  "zonal"  m.ethod  v;ith  various  principles  of  coding 
initial  information.  In  spite  of  the  fact  that  during  the  associative- 
address  method  of  retrieval  there  is  accepted  direct  organization 
of  initial  information,  it  turned  out  to  be  possible  for  such  IPS  to 
use  the  inverse  m.ethod  of  retrieval  [2],  which  permitted  significantly 
lowering  the  consumption  of  machine  tim.e  expending  realizing  inquiries. 

As  criterion  of  semantic  conformity  of  the  contents  of  the 
Inquiry  to  the  contents  of  the  document  in  the  mentioned  IPS  there 
is  accepted  the  entry,  sometimes  with  certain  elements  of  grammar, 
of  all  the  descriptors  of  the  inquiry  into  the  retrieval  pattern  of 
the  document  [POD]  (nO.D,).^  In  spite  of  its  logical  simplicity. 


^Under  elements  of  gramiriar  here  there  is  understood  appropriation 
of  weight  (0  or  1)  to  descriptors  of  inquiry  depending  upon  the 
semantic  load  which  they  carry  in  the  inquiry,  and  the  connection  of 
descriptors  in  the  inquiry  by  the  clusters  "AiCD"  and  "HOT," 


which  made  it  possible  considerably  to  simplify  the  IPS  algoritrun, 
the  given  criterion  of  semantic  conformity  together  with  principles 
laid  during  Indexing  of  the  document  ensures  sufficiently  high  output 
characteristics  of  ]PS. 

Below  there  is  examined  an  improved  variant  of  the  "Setka-5" 

IPS  on  the  "Minsk-22''  with  use  of  the  socket  associative-address 
method  of  organization  of  information. 

As  initial  information  in  the  examined  IPS  there  are  taken  two 
thematic  divisions  from  [3SBK]  (PCEK)  [expansion  unknown]  NIIEIR  — 
"Computer  technology,"  consisting  of  two  parts  (20  thousand  and  12 
thousand  documents),  and  "Transformers"  (1.3  thousand  doc’uinents ) . 

I.  Basic  Characteristics  of  the  "Mlnsk-22" 

For  the  best  understanding  of  certain  sides  of  the  work  of  IPS 
essentially  connected  w'ith  ETsMi  possibilities,  vfe  vjill  give  the 
basic  characteristics  of  the  ".Minsk-22." 

The  "Minsk-22"  is  a  tv;o-address  ETs'.T'  with  a  speed  of  5-6 
thousand  operations  per  second.  Its  calculation  grid  consists  of 
37  bits.  Magnetic  working  storage  (internal  m.emory)  consists  of 
ferrite  cores  arid  cur.tajns  7192  37 -bit  words.  External  memory  is 
magnetic-tape  storage  consisting  of  16  tape-drive  mechanisms.  On 
any  tape-drive  mechanism  there  can  be  set  raagnetic  tape  accomodating 
an  average  of  75  thousand  37-blt  words.  On  magnetic  tape  information 
is  recorded  in  zones.  The  volume  of  one  zone  is  2048  words. 
Information  is  recorded  on  magnetic  tape  from  fast  store  in  any 
place  of  a  zone,  and  readout  of  information  from  magnetic  tape  to 
fart  store  is  also  possible  from  any  place  of  a  zone.  It  is  possible 
to  put  out  -096  words  in  tv;o  zones  in  succession. 

Thie  "Minsk-22"  allows  input  of  numerical  (binary  and  binary- 
decimal  system)  and  alphabetic  (Cyrillic  and  Latin  alphabets) 
information.  Initial  information  can  be  fed  to  the  "Minsk-22"  both 
from  punched  tape  and  from  punched  cards. 
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Calculation  results  (numerical  and  alphabetic  infoi'mation ) 
derived  differently.  i.-umerical  information  in  octal  and  decimal 
systems  is  printed  at  a  rate  of  20  twelve-character  lines  per  secor.d. 

Alphabetic  Information  Is  printed  on  an  alpham.eric  printer 
([ATsPU]  (Aliny)).  The  maximum  value  of  a  line  of  ATsFl'  prlntiruj  is 
12?  s.'/inbols,  an-'’  ncr  i  ?  lines  per  second. 

II .  Information-Retrieval  System 

Application  of  ETsVl-'  for  information  retrieval  essentially 
constitutes  an  attempt  to  ensure  more  convenient  and  rapid  access 
to  accumulated  knowledge,  so  that  developers  and  scientific  workers 
are  given  timely  and  complete  scientif  Lc  and  technical  information 
needed  by  them  in  their  work. 

Examining  the  work  of  automated  IPS,  it  is  necessary  to  note 
that  in  an  absolute  majority  of  contemporary  IPS  ETsvfi  "take  over" 
the  main  part  of  the  "mechanical  work":  retrieval  in  a  large  array 
of  scientific  and  technical  information  but  one  built  in  a  certain 
way,  delivery  of  answers  to  inquiries  in  a  predetermined  f03‘ii'.,  etc. 

The  person  servicing  the  IPS  in  this  case  indexes  documents  for  putting 
into  the  IPS,  which  Is  connected  with  semantic  appraisal  cf  the 
contents  of  documents  of  scientific  and  technical  information, 
semantically  analyzes  and  indexes  entering  inquiries,  composes  a 
dictionary  of  descriptors,  etc.  It  is  obvious  that  in  the  near  future 
some  of  these  functions  will  be  wholly  and  some  partially  fulfilled 
by  ETsVM.  Certain  prerequisites  to  this  will  be  shovn  belov;. 

Let  us  consider  now  the  construction  and  functioning  of  the 
main  parts  of  automated  IPS  developed  at  the  lillEIR, 

1.  Creation  uf  a  Dictionary  of  Descripuors  for  the 
Thematic  Division  "Computer  Technology" 

The  first  stage  in  the  work  of  creating  an  automated-lPS 
language  was  the  indexing  of  documents  of  the  thematic  division 
"Computer  technology."  The  indexer  was  tasked  with  as  much  more 
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exactly  and  briefly  as  possible  expressing  the  basic  contents  of  the 
indeixed  document.  I'he  iridexer  need  not  limit  himself  to  the 
terminology  used  in  a  gicu'n  document,  when  it  is  necessary  to  use 
a  word  moi-e  exactly  or  mere  generally  copying  the  contents  of  the 
document,  altliough  not  contained  In  it. 

Tnus,  Oil  tiie  first  stage  of  vrork  there  vras  created  a  POD  card 
file  cf  the  division  "Computer  technology."  Division  POD  on  the 
average  consist  of  8  words.  The  greater  part  of  these  v/erds  expresses 
the  basic  semantic  contents  of  the  document,  and  the  lesser 
bibliographic  data  about  the  document  and  the  nair.e  of  the  equipment 
or  development  on  the  whole  in  the  given  document. 

The  follovfing  stage  was  connected  with  the  work  of  ETsiT-  and 
consisted  in  counting  frequency  of  recurrence  of  various  POD  terms. 

For  this  the  words  in  the  POD  were  normalized,  that  is,  they  were 
reduced  to  the  standard  form  of  recording  ~  masculine  gender, 
nominative  case,  and  singular  number.  Stable  word  combinations  of 
the  type  "arithmetic  unit,"  "magnetic  drunf  and  similar  ones  were 
replaced  by  trie  abbreviations  —  ["au, "  "mb"]  ("ay,"  "w6").  From  the 
POD  there  v.'ere  excluded  prepositions,  conjunctions,  and  other  words 
not  essential  for  transmission  of  the  basic  contents  of  documents. 
I.ormallzed  "OD  obtained  after  that  v;ere  perforated  and  v.’ere  intro¬ 
duced  into  the  FTsV].'.  "^ben  a  special  program  calcuL”  ted  frequency 
of*  recurren'^p  of  ‘^‘V'erv  v/ord  in  the  array  of  normalized  PCD  and  these 
data  v.’ere  printed. 

"he  total  number  cf  normalized  v.'ords  in  the  FOD  of  the  first 
part  of  the  dl’  ision  "Computer  teclcnology"  exceeded  120  thousand. 
Sixty-five  hundred  different  v.'ords  vjere  obtained.  The  frequency 
of  recurrence  of  individual  v.'ords  varies  from  one  to  2000. 

The  next  stage  of  ’//ork  is  composition  of  the  dictionary  of 
descriptors.  By  the  table  of  frequency  of  recurrence  of  words  in 
the  POD  there  v;ere  i-errioved  words  the  frequency  of  recurrence  of 
which  is  highe  ■  than  a  certain  numerical  threshold  (in  the  examined 
case  the  numbe.-  lOK  The  total  number  of  these  v.’ords  was  600-700. 


These  words  formed  the  main  part  of  the  dlctlonflr.v  of  descrliitors . 

From  a  large  part  of  the  vjords  the  frequency  of  recurrence  of  v/hich 
is  lower  than  the  threshold  shown,  there  were  formed  classes 
of  equivalence.  The  words  of  the  remaining  smaller  part  were  either 
replaced  in  the  POD  with  words  v.'hich  are  encountered  more  frequently 
and  are  then  introduced  into  certain  classes  of  equivalence  or 
removed  from  consideration. 

In  a  class  of  equivalence  there  were  united  words  th.e  presence 
of  one  of  which  in  the  inquiry  with  a  high  probability  v/ill  require 
delivery  of  documents,  indexed  in  other  words  of  the  given  class  of 
equivalence . 

One  of  the  words  of  the  class  of  equivalence,  usually  the  one 
most  fully  expressing  the  semantic  value  of  the  given  class  was 
called  a  descriptor,  and  all  the  remaining  words  of  the  class  of 
equivalence  were  called  key  words.  Along  with  frequently  encountered 
words  to  descriptors  there  v/as  advanced  a  certain  part  of  the  v-ords 
perspective  for  the  field  of  computer  technology  but  encountered 
fewer  times  in  the  array. 

Thus  there  were  formed  classes  of  equivalence,  that  is, 

81^  descriptors  ’rfere  determined. 

Words  which  are  close  in  meaning,  but  not  close  enougn  to  be 
united  into  one  class  of  equivalence,  are  supplied  v/ith  the  reference 
"see  also."  This  reference  is  needed  later  for  possible  expansion 
of  delivery  of  answers  to  inquiries. 

Besides  semantic  descriptors,  in  the  dictionary  there  were 
included  3^  descriptors  of  a  bibliographical  character  and  a  certain 
number  of  descriptors  designating  the  names  of  equipment,  developments, 
and  firms. 

Thus,  in  the  dictionary  of  the  thematic  array  "Computer 
technology"  there  were  Included  about  a  thousand  descriptors. 


2.  Construction  of  a  Machine  Dictionary  of 
Descriptors  of  an  Automated  IPS 

The  carrier  of  t)ie  machine  dictionary  of  descriptors  of  the 
improved  variant  of  t’le  "Setka-3"  automated  IPS  on  the  "Minsk-22"  is 
punched  cards. 

In  the  \ipper  part  of  the  punched  card  with  tlie  help  of  a 
typewriter  there  is  recorded  vhe  designation  of  the  descriptor  or 
key  word  and  reference  number  in  the  dictionary  of  descriptors  of 
the  class  of  equivalence  to  whic.'i  the  given  key  word  belongs  or  which 
the  given  descriptor  determines. 

Then  two  Ijnes  are  punched  in  the  punched  card.  In  the  first 
of  them  there  is  shov.-n  the  first  information  line  ( KC^) )  of 

the  descriptor  subarray  of  the  given  class  of  equivalence,  and  in 
the  second  the  second  information  line  (IS^).  Their  structure  and 
assj.gnnient  will  be  pulled  apart  below.  The  descriptor  and  all  key 
words  entering  the  class  of  equivalence  of  t'ne  given  descriptor  have 
identical  13^^  and  ISg. 

All  punched  cards  of  the  machine  dictlonaiy  ci  descriptors 
are  collected  in  a  (jaru  file,  in  v/nich  they  arc  located  in  aluhabetic 
order  of  descriptors  and  key  words.  On  the  average  the  card  file 
contains  every  punched  card  in  triplicate  or  quadruplicate,  since 
ans’wers  to  several  inquiries  can  be  retrieved  simultaneously  in  the 
IPS  under  consideration.  ?\;rtherT.ore,  there  are  a  certain  number  of 
punched  cards  with  the  designation  of  operation  of  negation. 

3.  Processing  of  Inquiries  and  Their  Input  into  ETs\qi 

Inquiries  for  retrieval  v/ith  the  help  of  automiated  TPS  Initially 
have  no  llmltatlonG  placed  on  them,  besides  the  wish  to  most  exactly 
formulate  the  object  of  retrieval. 

Then  the  specialist  servicing  the  IPS  according  to  the  given 
thematic  division  analyzes  the  incoming  inquiry  and  Indexes  it 


i  lhflt  Is,  translates  tlie  coni  entr.  of  ttie  inquiry  into  the  terrvis  of 
the  c'lclionary  of  descriptors  of  the  thematic  division);  aftoi'  Hint 
it  is  determined  whetlier  the  inquiry  is  complicated  or  siiirplc,  A 
simple  inquiry  is  one  whiOi  I'Clonys  to  only  one  t:ioinniic  division 
and  the  descriptors  in  wlilch  are  united  by  clusters  "Ai.T'"  and 
Any  Inquiry  not  satisfying  these  conditions  is  complicated.  A 
complicated  inquiry  is  reduced  to  the  sum  of  simple  inquiries  if 
possible,  or  ia  reformulated. 

Then  punched  cards  with  the  descriptors  of  tlie  simple  inquiry 
will  be  selected  from  the  card  file  of  the  machine  dictionary  of 
descriptors.  The  punched  card  of  the  negated  descriptor  is  proceeded 
by  the  punched  card  of  negation.  Inquiry  is  separated  from  Inquiry 
for  machine  realization  by  a  special  punched  card. 

The  group  of  inquiries  thus  selected  (the  maximum  number  of 
inquiries  in  the  group  must  not  be  over  36)  is  put  into  tl;e  reader 
and  fed  into  working  storage  via  programming. 

4.  Recording  of  Initial  Information  in  ETsMl 

Initial  information  for  retrieval  in  automated  TPS  is  an  array 
of  reference  numbers  of  documents  in  GSEK  NIIEIR  according  to  the 
corresponding  thematic  division  of  documents. 

In  the  examined  IPS  tliere  is  accepted  the  inverse  method  of 
retrieval  and  -■  accordingly  —  the  Inverse  form  of  organization  of 
initial  Information,  l.e.,  the  array  of  initial  Information  is 
recorded  in  the  form  of  so-called  descripxor  subarrays. 

Descriptor  sifoarrays  are  sets  of  reference  numbers  of  documents 
recorded  in  order  of  increasing  absolute  values  aiid  pertaining  to 
one  descriptor  (class  of  equivalence).  The  number  of  these  subarrays 
is  determined  by  the  number  of  descriptors  in  the  dictionary,  a>-id 
their  dimensions  by  the  frequency  of  occurrence  of  the  given 
descriptor  in  the  POD  of  the  examined  thematic  di. vis  ion  of  documents. 
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For  the  purpose  of  sevinp,  niemoj-y  and  hastening  retrieval  in 
] PS  there  is  accepted  position  coding  of  descriptor  subarrays  and 
the  tvjo-stage  method  of  retrleva.l  of  documents^  answering  the  inquiry. 
The  principle  of  position  coding  consists  in  the  fact  that  the  number 
of  a  document  is  not  written  in  the  form  of  a  number  in  this  or  that 
number  system,  but  is  determined  v;hen  necessary  as  the  number  of 
a  position  reckoned  from  a  certain  origin.  A  working  storage  bit 
was  taken  as  a  position  element.  Since  one  or  zero  can  be  recorded 
in  any  bit  of  a  v/orking- storage  cell,  then  let  us  agree  to  mark,  an 
occupied  position  one,  and  a  free  one  zero. 

Thus,  a  v/orking-storage  cell  can  be  considered  a  -position 
section,  which  v/ill  lienceforth  be  called  a  position  Interval. 

However,  during  direct  position  coding  of  an  array  of  initial 
information  descriptor  subarrays  become  sufficiently  large  in  value 
and  a  large  part  of  the  position  intervals  in  them  turn  out  to  be 
empty.  Therefore,  Indirect  position  coding  was  used  because 
position-interval  structure  have  to  be  modified. 

Every  position  interval  was  ascribed  its  reference  number  in 
the  general  sequence  of  position  intervals  of  the  examined  thematic 
division.  The  position  interval  was  cut  down  to  25  bits,  and  the 
remaining  11  bits  ’were  for  recording  the  reference  n’umber  of  the 
position  interva]..  All  empty  position  intcr’/als  ’.'.’crc  removed  from 
the  descriptor  subarrays. 

The  subarrays  of  numbers  of  documents  consisting  of  position 
Intervals  not  empty  belonging  to  a  certain  descriptor  is  called  the 
second  descriptor  subarray  and  the  position  intervals  of  this 
subarray  are  called  second  position  Intervals. 

To  speed  up  retrieval  every  descriptor  from  the  dlctjonary  was 
compared  with  a  subarraj^  —  the  subarray  in  which  direct  position  code 

^At  present  tb.e  possibility  of  a  three-step  retrieval  meti-od  is 
under  consideration. 


marked  all  numbers  of  position  intervals  entering  the  second 
descriptor  subarray  of  the  given  descriptor.  This  subarray  is  called 
che  first  descriptor  subarray,  and  position  Intervals  of  this  subarray 
are  called  first  position  intervals. 

First  descriptor  sucarrays  have  constant  value  for  all  descriptors 
of  one  thematic  division  of  documents.  Their  value  is  determined 
by  the  total  number  of  documents  in  this  division. 

The  first  and  second  subarrays  in  the  ETs\l-1  are  recorded  on 
m.agnetic  tape  in  tv/o  large  groups.  In  first  group  there  are  extracted 
all  first  descriptor  subarrays  and  into  the  second  all  second  ones. 

So  that  any  of  the  first  descriptor  subarrays  can  be  later 
supplemented  each  of  them  ends  in  a  set  of  empty  cells.  On  magnetic 
taps  the  first  descriptor  subarrays  are  recorded  in  order  of  decreasing 
number  of  position  intervals  in  the  corresponding  second  descriptor 
subarrays. 

Second  descriptor  subarrays  are  subarrays  of  variable  dimensions. 
Position  Intervals  in  them  are  located  in  order  of  their  increasing 
numbers.  On  magnetic  tape  one  subarray  is  •  eparated  from  another  by 
a  set  of  empty  cells.  In  tne  last  cell  of  this  set  after  complete 
filling  of  it  there  la  placed  the  address  Indicating  the  place  of 
recording  on  magnetic  tape  and  the  dimensions  of  the  nev;  subarray, 
which  is  a  continuation  of  the  completely  filled  set.  Thus  the 
recording  of  second  descriptor  subarrays  is  turned  into  a  socket 
associative-address  structure. 

On  magnetic  tape  second  descriptor  subarrays  are  recorded  in 
order  of  decreasing  numb  u-  of  numbers  of  documents  in  them. 

Let  us  corislder  now  the  structure  of  the  first  arid  second 
Information  lines. 

Each  of  the  subarrays  of  any  descriptor  is  set  in  conformiity  v;ith 
an  information  line  fixed  in  the  dictionary  of  descriptors.  It 
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Indicates  the  "lace  of  the  given  descriptor  of  the  subarray  on 
magnetic  -^.ape  and  Its  value. 

The  structure  of  the  first  information  line  follows: 

Mu  Aa  n, 

Vi’here  Ife,  is  the  number  of  the  magnetic  tape  zone  on  which  the  given 
first  descriptor  subarray  is  recorded;  As  is  the  initial  address 
of  the  first  descriptor  subarray  in  the  zone;  and  n  is  the  number 
of  cells  occupied  by  tnis  subarray. 

It  is  assumed  that  all  first  descriptor  subarrays  are  recorded 
on  magnetic  tape  hung  on  the  zero  tape-drive  mechanism. 

The  structure  of  the  second  information  line  is 

where  is  the  num.oer  of  the  tape-drive  mechanism  on  which  the 
corresponding  magnetic  tape  is  hung;  is  the  number  of  the 
magnetic-tape  zone,  on  which  the  second  descriptor  subarray  is 
recorded;  A„  is  the  initial  address  of  the  second  descriptor  subarray 
in  the  zone;  m  is  the  num;ber  of  cells  occupied  by  this  subarray. 

The  number  of  the  tape-drive  mechanism  is  shovm  because  second 
descriptor  subarrays  can  be  disposed  on  several  Viagnetic  tapes  hung 
on  different  tape-drive  mechanlsrris  accessible  simultaneously, 

5.  Description  of  the  Algorithm  of  V.-ork  of  the  IPS 

Documents  ansv/ering  an  inquiry  in  the  exam.ined  IPS,  are  found 
in  two  stages. 

The  first  stage  is  the  stage  of  rough  retrieval.  On  this  stage 
there  are  selected  rrumbers  of  second  position  intervals  common  to 
all  the  descriptors  of  the  inquiry.  As  a  result  of  this  there  is 


considerably  narrowed  the  scope  of  information  coming  into  play  in 
ttie  subsequent  processing. 

The  second  stage  is  the  stage  of  sampling  of  common  position 
intervals  in  the  second  descriptor  subarrays  of  the  inquiry  and 
separating  in  them  of  the  numbers  of  documents  common  to  all  fne 
descriptors  of  the  inquiry,  which  by  virtue  of  the  accepted  criterion 
of  semantic  conformity  signifies  finding  the  numbers  of  documents 
answering  the  inquiry  submitted. 

Introduction  of  the  two-stage  method  of  retrieval  permitted 
accelerating  the  retrieval  process  and  made  possible  simultaneous 
retrievals  in  response  to  36  inquiries. 

Let  us  consider  in  detail  fulfillment  of  each  of  these  stages. 

A  group  of  information  lines  of  Inquiry  descriptors  Is  put  into 
working  storage  from  punched  cards.  After  that  the  group  is  split 
up  into  two  subarrays  —  the  subarray  of  the  first  inform:ation  lines 
and  the  subarray  of  the  second  information  lines  anc  table  is 
formed,  in  which  there  are  fixed  the  number  of  the  inquiry,  the  number 
of  descriptors  (information  lines)  in  it,  and  the  initial  address 
of  the  subarray  of  the  first  and  second  information  lines  of  the 
corresponding  inquiry.  Information  lines  of  negated  descriptors  are 
recorded  with  minus  signs. 

After  that  among  positive  TS^  [information  lines]  there  are  those 
the  n'umber  of  the  magnetic-  tape  zone  of  which  is  maximum  and  mir.lmiumi. 
The  zone  with  the  mirilm.umi  lumber  is  read  into  fast  store  and  from, 
it  there  are  separate!  the  necessary  first  descriptor  subarrays, 
which  are  then  put  with  the  help  of  the  operation  of  logical  multi¬ 
plication  into  an  earlier-prepared  place  in  fasL  store.  Preparation 
of  the  place  In  fast  store  consists  in  recording  ones  in  all  bits 
of  a  certain  set  of  successive  cells,  which  will  allow  carrying  out 
logical  multiplication  even  for  the  first  subarray. 


56 


After  full  processing  of  the  zone  its  number  Is  compared  with 
the  maximum  number  found.  If  these  numbers  did  not  match,  a  new 
minimum  zone  number  is  found  among  the  remaining  positive  and 
the  zone  found  is  processed  as  described  above.  If,  however,  the 
zone  numbers  matched,  rough  retrieval  is  over  with  and  It  is  necessary 
Lo  go  to  a  nev;  stage  of  processing  —  checking  the  equality  to  zero 
of  all  the  lines  of  the  result  of  processing  of  the  first  descriptor 
subarray  of  each  of  the  inquiries.  In  case  these  lines  are  equal  to 
zero  the  given  inquiry  is  not  further  processed.  Its  number  is 
printed  v/ith  a  minus  sign.  Otherwise,  after  completely  checking  the 
whole  array,  one  goes  to  the  next  stage  of  processing. 

The  next  stage  of  processing  is  preparation  of  a  place  in  fast 
store  for  the  second  part  of  retrieval  —  finding  the  numbers  of 
documents  ansv.'ering  the  inquiry. 

There  is  formed  table  T^,  in  which  there  is  noted  number  of 
inquiry,  number  of  second  position  intervals  which  must  be  further 
processed  in  response  to  the  given  inquiry,  and  the  initial  address  in 
fast  store  of  the  corresponding  subarray  of  second  position  intervals, 

A  place  in  fast  store  for  the  second  stage  of  retrieval  is 
prepared  in  the  fcllov.'ing  way.  In  the  first  25  bits  of  the  cells 
there  are  stored  ones,  and  in  the  11  last  ones  the  number  of  the 
second  position  interval  for  v/hich  further  finishing  is  required. 

Thus,  for  each  of  the  inquiries  there  are  recorded  as  many 
lines,  as  there  are  position  intervals  in  it  requiring  finishing. 

The  second  part  of  retrieval  starts  with  finding  the  maximum 
number  of  tape-drive  mechanism  and  the  maximum  zone  on  it  and  also 
the  minim.um.  number  of  tape-drive  mechanism:  and  minimum  zone  on  it 
among  second  information  lines. 

The  minimum  zone  is  read  into  fast  store,  and  from  the  correspond¬ 
ing  second  descriptor  subarray  there  are  selected  second  position 
intervals  which  require  finishing,  and  v:ith  the  help  of  logical 


multiplication  they  are  put  into  their  place  in  fast  store.  The 
position  part  of  the  second  position  intervals  of  negated  descriptors 
before  superposition  is  inverted  v/j.th  the  help  .'f  logical 
multiplication. 

Second  position  intervals  are  processed  simiD.arly  until  the 
maximum  zone  on  the  maximum  tape-drive  mechanism  has  been  reached. 

The  last  processing  stage  is  the  decoding  of  the  position  code 
and  delivery  of  answers  to  inquiries.  Position  code  is  decoded  by 
the  formula; 


Nt„.25+P. 


where  Jfen.  Is  the  number  of  the  position  interval  and  P  is  the  number 
of  the  position  in  the  given  position  interval. 

6,  Forms  of  Deliveries  of  Ansv;ers  to  Inquiries 

In  the  examined  IPS  provision  is  made  for  two  forms  of  deliveries 
of  answers  to  Inquiries. 

The  first  form  is  deliverj*  of  reference  numbers  of  documents 
from  the  corresponding  thematic  division  CrSBK  NITFIR. 

The  second  form  is  delivery  of  bibliographic  descriptions  of 
documents  recorded  under  the  given  numbers.  Delivery  of  blblicgraphic 
descriptions  of  documents  is  carried  out  at  the  option  of  the  inquirer. 

Examples  of  deliveries  follov;: 

-I-  0000  0000  ooot 

+  OOOO  23951 
+  0000  0000  0002 
-I-  0000  23953 

where  the  first  and  third  lines  designate  the  reference  numbers  of 
inquiries,  and  the  second  and  fourth  the  reference  numbers  of 
abstracts  issued  in  response  to  the  corresponding  inquiries. 
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Inquiry  1 


Cr : 


2J951.  Programs  of  addition  and  subtraction  of  sine  with  floating 
point  on  t;'.e  ''Pral-l,"  Klokachev  I.  V.  In  the  collection 
"Solution  of  ennineerine:  problems  with  electronic  computers. 
L.,  1963,  8-13.“ 

Inquiry  2, 

25953.  (Algorithms  in  Algol -60).  b'egstein  I,  H,  Algorithms 

"Communs  Assoc.  Comput.  Mash.'  1963,  6,  No.  8,  441-450 
( English) . 


7.  IPS  Characteristics 


Let  us  give  some  IPS  characteristics. 


1.  Number  of  inquiries  simultaneously 

serviced . . .  36 

2.  Average  time  /machine)  of  retrieval  for 

one  inquiry .  5-7  s 


5.  Number  of  m.agnetic-tape  zones  necessary 
for  storage: 

a)  of  descriptor  subarrays  (20  thousand 

documents ) . 

b)  of  bibliographic  descriptions  (20 


thousand  documents) .  300-400 

4,  [, 'umber  of  inquiries  satisfied  per  shift: 

a)  answer  in  the  form  of  reference 

numbers  of  documents .  1-1.5  thousand 

b)  ansv;er  in  the  form  of  bibliographic 

descriptions  of  documents . . .  0.5  thousand 
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EXPERIENCE  IN  CPEATING  IIN^ORMATION- RETRIEVAL 
LANI’JAGE  VIA  COMPUT'ER  TECHNOLOGY 


V.  K,  Vakhabov,  A.  A.  Mikhaylova,  L.  M.  Yesilevskaya  and 

T.  S.  Kutayeva 

At  the  Perm  Scientific  Research  Institute  of  Control  Machines 
and  Systems  attempts  are  being  made  to  create  an  automated  information- 
retrieval  system  [(IPS)]  ((nflC))  for  a  reference  and  information 
fund  [(SIP)]  ((CK^O)  of  fhe  [ONTI]  (OHTTI)  [Association  of  Scientific 
and  Technical  Publishing  Reuses]  of  the  instrument. 

An  important  IPS  elemenr  is  information-retrieval  language 
r(IPYa)l  ((KITH)),  i'he  present  report  reports  on  the  development  of 
information-retrieval  language  of  the  descriptor  type  according  to 
the  division  of  "Computer  technology." 

During  selection  of  IPYa  structure  there  v?ere  considered  the 
following  peculiarities  of  IPS  operation. 

1.  High  productivity  of  retrieval  (up  to  three-four  thousand 
inquiries  in  a  day)  via  application  of  a  magnetic-drum  electronic 
digital  computer. 

2.  The  need  for  machine  translation  of  key  words  in  the  retrieval 
instruction  into  codes  of  descriptors  to  increase  retrieval 
productivity. 
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3,  Absence  of  direct  feedback  with  the  user  during  macliine 
retrieval  for  correction  of  the  retrieval  instruction  for  the 
purpose  of  obtaininc  the  required  fullness  and  accuracy.  i’cedback 
is  achieved  in  the  system  via  application  of  a  three-circuit  IPS 
system,  where  the  first  clrc\iit  does  not  have  feedback  (r'ic.  1). 


Fig.  1.  Block  diagram  of  three-circuit  IPS. 

Under  such  conditions  presence  of  noise  requires  only  a  certain 
increase  in  the  productivity  of  the  secondary  circuit.  Therefore, 
noise  Is  less  important  than  losses. 

1.  The  whole  information  array  in  the  branch  center  is  split 
into  a  number  of  big  thematic  subarrays  with  a  volume  of  the  order 
of  30  thousand  documents.  For  each  of  the  subarrays  its  ovm  local 
IPYa  is  developed. 

The  enumerated  peculiarities  of  IPS  operation  determine  the  most 
Important  features  of  the  information-retrieval  language  developed, 

1.  The  language  has  basic  relationships  of  the  type  "higher  — 
lower"  between  ideas  in  order  to  ensure  delivery  of  documents 
concerning  particular  ideas  on  an  inquiry  formulated  in  more  general 
ideas,  and,  thus,  lower  losses. 

2.  It  was  decided  to  Introduce  grammatical  means  very  carefully, 
only  after  experimental  measurements  of  noise.  At  present  it  has 
been  decided  not  to  intorduce  grammar.  If  noise  with  increase  or 
array  exceeds  50?j>  then  in  the  first  place  one  should  apparently 
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introduce  grammar  of  the  type  of  Indicators  of  coinmunlcation. 
Application  of  role  indicators  is  problematic  since  available  source 
materials  [2],  ["J]  show  that  \ise  of  role  Indicators  leads  to  high 
subjectivity  of  indexing  and  does  not  lower  noise  much  when  losses 
increase  considerably. 

3.  The  criterion  of  semantic,  conformity  of  the  developed  language 
is  simple  —  on  entry.  Basic  relationships  are  considered  during 
indexing  in  the  following  vmy:  if  the  descriptor,  included  in  the 
retrieval  pattern  has  dispatch  to  a  higher  descriptor,  then  the  Indexer 
includes  the  higher  descriptor  in  the  retrieval  pattern.  This  is 
equivalent  to  an  insignificant  increase  in  depth  of  indexing  but  does 
not  complicate  the  criterion  of  semantic  conformity.  The  simplicity 
of  the  criterion  of  semantic  conformity  is  the  condition  of  high 
productivity  of  retr.teval, 

t.  Depth  of  Indexing  averages  8-10  descriptors  per  document. 

As  results  of  experiments  of  Soviet  and  foreign  specialists  show 
[Ij  and  [t],  increasing  the  number  of  descriptors  in  the  retrieval 
pattern  markedly  j.ncreases  noise  when  fullness  increases 
insignificantly. 

5,  Presence  of  a  machine  dictionary  for  translation  of  the  key 
words  of  an  Inquiry  into  cedes  of  descriptors  inevitably  requires  a 
certain  standard.i.eation  of  key  words  of  the  dictionary  and  retrieval 
instruction. 

The  following  basic  rules  are  accepted: 

a)  a  m.ajority  of  key  words  are  separate  words  of  natural  language. 
VJord  combination  is  used  only  in  the  case  when  it  is  a  commonly  used 
scientific  term.  It  can  correspond  to  the  abbreviation,  which  is 

also  included  in  the  dictionary.  For  example:  computer  —  [VT1]  (BM), 
memory  unit  —  fzj]  (3y): 

b)  key  words  have  to  be  nouns,  adjectives,  rarely  numerals; 
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c)  all  words  are  s^iigular,  with  the  exception  of  those  words 
having  no  singular; 

d)  adjectives  are  masculine.  The  dictionary  was  composed  on  an 
array  of  I'JOO  abstracts  on  computer  technology  from  [R5J]']  (P3T) 
[expansion  unknovm]  Journals. 

In  the  process  of  free  indexing  of  abstracts  there  were  selected 
key  words,  they  were  integrated  into  classes  of  conditional 
equivalence,  and  basic  links  were  established  in  the  form,  of 
references  "see"  to  higher  descriptors.  Upon  the  termination  of  this 
work  the  dictionary  contained  664  words  and  367  descriptors  (classes 
of  conditional  eqal valence ) .  Then  on  the  basis  of  the  available 
dictionary  IO6O  abstracts  were  Indexed.  New  words  were  added  to  the 
dictionary.  At  present  the  dictionary  contains  702  key  words  in  4o4 
classes  of  conditional  equivalence. 

From  these  data  it  is  possible  to  trace  the  character  of 
dependence  of  the  value  of  the  dictionary  on  the  volume  of  the 
information  array  (Fig.  2).  From  the  given  graph  it  is  clear  that 
growth  of  the  dictionary  is  considerably  delayed  when  the  array  of 
documents  increases.  This  phenomenon  is  called  dictionary  saturation. 


Fig.  2.  Dependence  of  the  volume 
of  the  descriptor  dictionarj'  and 
the  dictionary  of  key  words  on  the 
number  of  documents  in  the  array: 
m  is  the  number  of  documents  in 
the  array,  n'  is  the  number  of 
desciiptors  in  the  dictionary,  n  is 
the  number  of  key  words  in  the 
dictionary. 
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To  evaluate  tlic  language  developed  there  was  conducted  an 
experiment  on  an  initial  array  of  IJOO  documents  and  on  a  summary 
ari'ay  of  256O  documents.  The  purpose  of  the  experiment  was: 

1)  to  determine  noise  factors  and  losses; 

2)  to  clarify  how  these  indices  vary  with  increase  in  the 
retrieval  array.  This  is  necessary  to  forecast  Introduction  of 
grammatical  means  into  the.  IPYa. 

For  the  experiment  there  were  formulated  250  inquiries.  They 
were  composed  by  specialists  not  participating  in  IPVa  development. 
In  every  inquiry  there  were  no  fewer  than  three  key  words,  a  rr.axlrrru: 
of  nine,  and  an  average  of  five  key  words  in  an  inquiry.  On  the 
basis  of  these  inquiries  the  coefficient  of  accuracy  was  calculated 
by  the  formula  [2]: 


where  R  is  the  number  of  relevant  documents  in  the  delivery,  and  L 
is  the  total  number  of  documents  in  the  delivery. 

To  calculate  the  coefficient  of  accuracy  retrieval  was  carried 
out  for  150  inquiries.  For  all  I50  inquiries  412  docuirients  were 
hits,  of  which  551  v;ere  relevant. 

Analysis  showed  that  for  115  inquiries  only  relevant  documents 
were  hits,  and  for  57  inquiries,  besides  relevant  docujrients, 
documents  not  ansv/ering  an  inquiry  were  hits. 

Of  57  enquiries  22  drew  one  unnecessary  document  each,  10  drew 
two,  and  the  other  five  inquiries  each  drew  5  or  more  documents. 
•After  a  study  of  causes  of  errors  it  turned  out  that  7f  of  informal 
noise  was  caused  by  indexing  deficiencies  and  93^  by  irremovable 
noise  through  false  combinations. 


Example  1.  Inquiry:  principle  of  action  of  core  storage. 

The  total  number  of  documents  issued  to  the  inquiry  is  six,  five 
relevant.  One  document  does  not  answer  the  Inquiry  and  was  issued 
as  a  result  of  false  combinations .  The  document  talks  about  thie 
principle  of  action  of  tiiln-fllm  storage  and  the  method  of  selection 
of  words  with  the  help  of  ferrite  cores. 

Example  2.  Inquiry:  characteristics  of  magnetic  storage.  The 
total  number  of  documents  issued  to  the  Inquiry  is  22,  21  relevant, 
one  superfluous  (not  answering  the  inquiry)  document  is  issued  v'la 
false  combinations.  The  document  talks  about  characteristics  of  a 
military  electronic  miniature  system  with  magnetic  ZU  storage]. 

Coefficient  of  fullness  was  calculated  by  two  methods:  first, 
as  a  percentage  of  the  number  of  relevant  documents  in  the  delivery 
to  the  total  number  of  relevant  documents  in  the  retrieval  array  [2]j 
secondly,  as  the  ratio  of  the  number  of  found  initial  documents  to 
100  inquiries  [2].  A  source  document  is  the  document  from  v/hich  the 
inquiry  is  composed.  There  are  100  initial  documents. 

In  view  of  the  complexity  and  labor-consuming  character  of  finding 
the  total  number  of  relevant  documents  in  the  retrieval  array, 
coefficient  of  fullness  was  determined  by  the  first  method  for  10 
inquiries.  Coefficients  of  fullness  calculated  by  the  two  different 
methods  give  the  same  result  (see  table). 


Table. 


Parameters  of  IPS  effectiveness 

Information  array 
on  wnich  the  IPYa 
v/as  created  (IJOO), 

Summary  array 
on  which  IPYa 
was  worked  out 
(2560),  h 

Coefficient  of  fullness  (first 

method) 

92 

92 

Coefficient  of  fullness  (second 

method) 

92 

92 

Coefficient  of  accuracy 

85 

eo 
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Losses  of  documents  occur  through  the  complexity  of  calculation 
of  all  aspects  illuminated  in  the  document  during  indexing. 


Example .  Inquiry:  application  of  thermoregulators  for  normal 
ZU  operation.  Due  to  the  absence  in  the  retrieval  pattern  of  the 
key  word  "Thermoreg^ulator"  the  following  text  was  not  issued  in 
answer  to  this: 

Copy  ::o.  Universal  Decimal  Classification  681.142.652.2 

99S7  Stabilization  circuit  of  recording  current  of 

address  circuit  of  type-Z  Zj.  "information 
of  inquired  sheets."  19'->5  No.  3441,  3  P.j 
illustrated. 

There  is  described  the  circuit  of  stabilization  of  the  address 
recording  current  intended  for  use  in  type-Z  ZU  containing  128 
27-bit  numbers  consisting  of  (2  x  1,  4  x  0.9)  [VT-1]  (BT-l)-type 
ferrite  cores.  The  cores  operate  under  the  following  conditions; 
readout  current  =  -(1,2-1. 5)  A;  discharge  current  of  recording 
^pasp  “  address  current  of  rev,ording  Isan  =  ^*7  A;  fixed  bias 

=  0.3  A.  The  stabilizing  circuit  consists  of  six  [P25B]  (flBSB) 

(or  P25A)  transistors  and  regardless  of  the  .nomber  of  reversed  cores 
in  the  numerical  rule  (load)  ensures  stable  current  laan  =  A  by 
way  of  limiting  it  by  the  internal  resistance  of  the  circuit.  To 
ensure  normal  ZU  operation  vfnen  ambient  temperature  varies,  in  the 
stabilizing  circuit  there  are  used  thermoregulators  consisting  of 
tvjo  semiconductor  therrdstors  and  a  diode.  The  proposed  circuit 

^  c 

ensures  normal  ZU  operation  in  the  temperature  range  from  -20  to  -foO  C. 
The  results  of  the  experiment  are  given  in  the  table. 

In  the  process  of  indexing  the  information  array  there  was 
conducted  a  study  of  law  of  distribution  of  descriptors  in  the 
retrieval  patterns  of  the  documents. 

As  is  knc,-;r  [5]  and  [5],  the  frequency  of  appearance  of  words 
of  natural  langi-.age  follov;s  the  Zipf  law  with  high  accuracy 
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where  K  -  const;  1=1,  2,  n  is  the  reference  number  of  word 

during  location  in  order  of  decreasing  frequency. 

!;Grc  exactly  the  distribution  of  the  'words  of  natural  language 
is  described  by  the  Mandel 'trot  lav,',  vmich  essentially  generalizes 
the  Zipf  law; 


P _ 

'  {fl  +  O"’ 

where  K,  B  and  a  are  constants,  'where  l<o<I.2. 

The  study  of  the  real  law  of  distribution  of  descriptors  in 
retrieval  patterns  of  documents  shows  that  Zipf  and  Mandel 'brot  laws 
known  for  distribution  of  words  of  natural  language,  also  v.'ell 
describe  distribution  of  descriptors. 

Figure  5  shows  the  real  law  of  distribution  of  descriptors. 

From  the  graph  it  is  clear  that  this  law  can  be  described  by  the 
expression: 


P, 


0.202 
2  +  <  • 


which  fully  agrees  with  the  Zipf  and  Mandel 'brot  laws. 


Fig.  3.  1  —  the  real  la’.; 

of  distribution  of 
descriptors  in  retrieval 
patterns  of  documents,  2  — 
curve  plotted  according  to 

the  law  . 
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Conclusions 


1.  Conducted  experiments  show  that  the  developed  IPYa,  In 
spite  of  its  simplicity  of  structure,  has  fully  satisfactory 
characteristics . 

2.  Growth  of  dictionary  is  considerably  delayed  during  the 
grov.'th  of  an  array  of  over  2000  documents. 


3.  Input  of  grammatical  means  into  IPYa  is  inexpedient  for 
small  arrays.  The  question  of  needing  to  introduce  grammatical  means 
into  developed  IPYa  will  be  examined  after  carrying  out  In  1967 
experiments  on  an  array  of  15-20  thousand  documents. 

4.  Distribution  of  descriptors  in  retrieval  patterns  of 
documents  obeys  the  same  Zipf  and  Mandel 'brot  laws,  as  v/ords  in 
natural-language  texts. 
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REALIZATION  OF  THEMATIC  INFORMATION-RETRIEVAL 
SYSTEMS  ON  AN  ELECTRONIC 
DIGITAL  COMPUTER 

M.  P.  Ubiyko 

At  present  patent-licensing  v;ork  in  scientific  research 
Institutes  is  continuously  expanding.  Eranch  patent  funds  are 
growinpj  the  volume  of  patent -description  investigations  is  grov;ing. 

From  patent  funds  there  are  determined  the  world  technical 
level  and  direction  of  developitent  with  respect  to  one  technical 
branch  or  another.  With  respect  to  the  same  funds  there  are  made 
numerous  examinations  fcr  the  purpose  of  determining: 

—  the  patent  purity  of  articles  or  their  ccrr.ponents ; 

—  the  novelty  of  developments  and  inventions; 

—  the  need  to  develop  a  new  article  or  the  expediency  cf 
obtaining  licenses. 

These  operations  take  much  time  of  specialists,  inasm.uch  as  it 
is  necessary  to  examine  hundreds  and  thousands  of  patent  descriptions, 
in  order  to  make  a  patent  exarriination.  Months  are  spent  making  an 
average  examination.  The  rriain  part  of  the  time  (ever  ECfS)  is  taken 
retrieving  patents. 
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Therefore,  the  tendency  of  many  specialists,  whole  organizations, 
and  Institutes  to  simplify  and  accelerate  these  operations  is 
natural.  And  apportionment  of  patents  on  narrow  subjects  is  the 
first  step  in  this  direction. 

The  :r.ost  essential  shortening  of  the  time  spent  retrieving 
patents  is  given  by  systematization  of  branch  patent  fund  with  help 
of  v.'eli-known  tabulyagrar.s  [no  translation  found].  This  method  was 
developed  by  the  Central  Scientific  Research  Institute  of  Patent 
Information  TsNIIPI. 


Our  organization  from  the  very  beginning  of  formation  of  the 
branch  patent  fund  systematized  it  vfith  the  help  of  tabulyagrarr.s . 

Further  operational  experience  showed  that  in  such  systemati¬ 
zation,  considerably  facilitating  and  accelerating  retrieval  of 
patents,  there  is  laid  a  real  possibility  for  mechanization  of 
retrieval  with  the  help  of  ETslTl,  which  will  reduce  still  more  the 
time  it  takes  to  retrieve  patents.  All  patent  data  in  tabulyagrams 
are  encoded  by  numerals. 

In  our  organization  it  was  possible  to  realize  retrieval  on 
the  ’'ural-2." 


We  rnec'^anlzed  such  labor-consuming  processes  as  : 

—  examination  of  all  patents  pertaining  to  the  subject  of 
interest  v;ith  respect  to  the  funds  cf  one  or  several  countries. 

—  sampling  and  recording  of  numbers  of  patent  descriptions 
from  patent  funds  of  cne  country  or  a  group  of  countries  which 
pertain  directly  to  the  assigned  subject. 

—  delivery  of  a  typewritten  reference  of  contents:  =  On  the 
technical  question  interesting  You  in  the  country  (in  the  countries) 
there  are  the  following  numbers  of  patent  descriptions  (there  are 
reported  the  country  and  number  cf  patents  in  the  fund  of  this 
country) .  = 


The  only  thing  left  for  the  specialist  to  do  is  to  obtaJ.n 
patent  descriptions  on  tlic  issued  reference  in  tlie  paten-  iibrai'y 
and  to  Investigate  them. 

Thus,  the  specialist  assigns  the  question  and  -tigaces  the 

patent  descriptions  found,  and  any  other  work  is  dc  machine. 

However,  it  is  noticed  that  of  the  obtained  and  investigated 
patents  the  specialist  finds  only  a  few  patents  of  interest  to  biim, 
and  sometimes  nothing  at  all. 

We  now  work  on  these,  in  order  to  mechanize  this  process,  too, 
i.e,,  investigation.  For  this  purpose,  appai'ently,  it  v/ill  be 
nec'^ssary  to  train,  figuratively  speaking,  a  machine  to  answer  more 
complicated  questions. 

For  realization  of  the  idea  of  computer  retrieval  of  patents 
there  was  carried  out  a  volume  of  works  on  programuming  of  retrieval 
on  the  machine  and  the  corresponding  shift  of  the  contents  of  the 
tabulyagrams  to  the  storage  units  of  the  machine. 

1.  Volume  of  program.ming  usually  depends  on  the  complexity  of 
those  tasks  required  of  the  machine  and  on  the  volume  of  operations 
which  need  to  be  carried  cut. 

But  since  in  this  case  the  discussion  concerns  patent  informatir 
we  are  first  of  all  interested  in  retrieving  the  patents  we  need 
froiri  the  whole  volume  of  the  branch  fund  without  the  participation 
of  the  specialist.  Therefore,  program-ming  was  faced  viith  two  simple 
questions : 

—  What  numbers  of  patent  descriptions  are  in  one  country  "X" 
on  the  technical  question  interesting  us  (transmission  system,  form 
of  propeller,  and  others)? 

—  What  numbers  of  patent  descriptions  are  in  any  group  of 
countries  on  the  same  question  or  on  a.ry  other  technical  question 
(switches,  reduction  gears,  or  transistors,  and  others)? 
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2.  After  that  all  patent  information  of  one  tabulyagram  was 
split  up  into  groups  —  so-ca].led  zones  with  appropriation  to  them 
of  numbers.  In  accordance  v.’ith  the  volume  of  the  operational 
storage  unit  of  the  machine  in  each  zone  there  shouJ.d  not  be  more 
than  (4096  instructions)  or  2048  numbers. 

3.  Then  there  was  composed  a  so-called  vforking  table  of 

operator  with  numbers  of  zones  and  addresses  for  input  of  information, 
in  which  there  were  the  following  columns: 

1)  numbers  in  order; 

2)  numbers  of  headings  according  to  the  tabulyagrams; 

3)  numbers  of  zones  (recorded  in  the  octal  system); 

4)  address  of  memory  cells  for  every  zone  (recorded  in  the 

octal  system) . 

Initial  addresses  of  all  zones  start  with  1000.  The  final 
address  is  determined  by  the  number  of  patents  in  the  given  zone. 

Part  of  the  addresses  —  from  0  to  1000  —  is  assigned  for  telegram. 

4.  For  input  of  initial  information  (patent  Information)  of 
groups  of  numbers  or  a  zone  in  the  operational  storage  unit  as 
information  carrier  there  was  selected  standard  opaque  35-cm)  film. 

Such  an  information  carrier  is  distinguished  by  its  independence, 
which  is  very  convenient  in  those  cases  in  v;hich  an  ETsVM  is  used 
mainly  for  other  purposes.  Furthermore,  the  volume  of  information 
in  such  a  carrier  is  practically  unlimited. 

On  film  information  from  tabulyagrams  is  transferred  by  the 
method  of  punching  on  an  external  device  of  the  machine  (FFCh2)  . 

The  following  data  are  recorded  on  punched  tape: 

—  zone  number; 
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—  patent  number; 


—  number  of  country  to  v.biicli  patent  belongs. 

Numbers  of  patents  are  printed  in  the  decimal  system  next  to 
the  number  of  the  country  also  in  the  decimal  system. 

The  patent  number  must  not  exceed  an  eight-digit  number,  and 
the  number  of  the  country  a  two-digit  number.  These  conditions  are 
easy  to  satisfy. 

The  data  of  the  printing  are  thoroughly  collated  with  the  data 
of  the  tabulyagram  and  then  the  tape  is  glued  in  a  circle,  and  it 
Is  ready  for  use.  It  is  numbered  correspondingly. 

Punching  of  tape  with  transfer  of  information  by  one  thousand 
patents  is  fulfilled  in  3-^  shifts  by  one  man.  Data  of  52  patents 
are  accommodated  in  one  meter  of  tape.  The  maximum  permissible  tape 
length  is  250  m  (about  12,000  patents).  Input  speed  is  I5C-I6C 
patents  per  second.  Tape  speed  is  2.S  m/s. 

Order  of  Retrieval  of  Patents  with  the 
Help  of  a  Computer 

The  order  of  retrieval  of  patents  can  be  comprehended  b  ist  of 
all  with  the  follovjing  example.  The  specialist  (designer  01  developer) 
asks  for  a  report  on  what  numbers  of  patent  descriptions  art  in 
France  on  the  technical  question  in  whici;  he  is  interested 

The  operator  of  the  section  in  the  beginning  determines  to 
which  tabulyagram  and  to  which  of  its  zones  the  given  technical 
question  belongs.  From  the  tabulyagram  number  it  finds  the  corre¬ 
sponding  punched  tape.  After  that  machine  memory  is  fed  data  of 
patents  of  the  whole  zone  to  which  this  question  belongs.  A  data- 
processlng  program  is  also  introduced.  Conli-ol  is  transferred  to 
the  beginning  of  the  program. 

The  program  examines  all  numbers  of  countries.  And  ii  the 
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France  number  is  found,  then  the  number  of  the  patent  next  to  the 
number  of  this  country  will  be  printed  on  paper.  Thus  there  is 
examined  all  information  of  the  given  zone  and  all  patents  belonging 
to  tlie  French  fund  on  the  technical  question  asked  will  be  printed 
on  paper.  In  such  a  sequence  there  are  also  processed  all  other 
inquiries  of  specialists. 

Obtaining,  thus,  the  reference  with  the  number  of  the  patents, 
the  specialist  analyzes  and  studies  these  patents. 

The  Technical  Effect  Obtained  from  the  Application 
'  of  Machine  Retrieval 


1.  The  main  effect  of  the  application  of  machine  retrieval  of 
patent  descriptions  is  considerable  acceleration  of  retrieval  of 
patents.  For  example,  retrieving  100  patent  descriptions  on  any 
technical  question  with  the  help  of  tabulyagrams  and  recording,  only 
the  numbers  of  patents  takes  a  minimum  of  50  min.  With  the  help  of 
the  machine  this  work  takes  no  longer  than  4  min  (input  of  program 
and  information  2-5  niin  and  retrieval  1  min)  . 

2.  The  second  effect  is  the  releasing  of  highly  skilled 
specialists  from  retrieval  of  patents.  This  work  on  their  assign¬ 
ment  can  now  be  fuLfilled  by  an  operator  or  laboratory  technician 
on  a  computer. 

5.  Mechanization  of  retrieval  makes  it  possible  to  organize 
patent-licensing  work  in  reference  to  the  flow  chart  of  development 
of  new'  articles  —  in  all  its  stages. 

Region  of  Application 

Introduction  of  the  macnine  method  of  retrieval  of  patent 
descriptions  is  easy  to  carry  out  in  all  organizations  having 
STst'M.  One  practical  application  of  the  machine  method  of  retrieval 
of  patents  is  expedient  in  those  cases  in  which  the  frequency  of 
retrievals  is  high  —  10-15  retrievals  per  day. 


certain  difficulties  arise  in  organizations  not  having  co'.ripurers . 
Since  in  these  cases  not  ttie  problem  of  use  of  th.e  machine  for 
rc'-rlc'/'^ ]  of  patents  but  the  proi  loin  of  ins  obtaining,  or  Mie  j  imbloi;. 
of  cooperation  viltti  other  organizations  having  such  machines  is 
solved . 

Application  of  ETsVM  for  bigger  —  republic  patent  funds  —  is 
unconditionally  expedient,  but  in  these  cases  problems  of  systematiza¬ 
tion  of  the  whole  fund  con'e  up.  All  patents  iiave  to  spread  according 
to  narrowly  specialized  or  other  criteria. 

This  is  a  question  of  the  competence  of  the  central  scientific 
research  institutes. 

However,  these  problems  mu'^t  be  solved  since  the  study  of 
technical  levels  according  to  patent-technical  descriptions  and  the 
carrying  out  of  patent  examinations  takes  up  too  much  of  the  time 
of  highly  skilled  specialists.  Reviuction  of  this  time  is  problem 
number  one,  all  the  more  so  because  the  volume  of  information  is 
c  ont inuous ly  inc  r ea s i ng . 

The  contemporary  level  of  development  of  computer  technol.ogy 
permits  solving  this  problem  already  now  and  with  great  effect, 
which  is  confirmed  by  the  experiment  of  our  organization. 
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CERTAIN  QUESTIONS  OF  MACHINE  PREPARATION  OF  INDEXES 
FOR  CURRENT  AND  RETROSPECTIVE  RETRIEVAL  OF 
SCIENTIFIC  Airo  TECHNICAL 
INFORI'IATTON  I'-iATERIALS 

D.  I.  Mekhtlyev  and  E.  K.  Kuznetsova 

In  contemporary  practice  of  information  service  a  more  and 
more  considerable  place  is  being  occupied  by  various  Kinds  of 
indexes  prepared  with  the  help  of  punch-card  machines  and  computers 
for  example,  permutational  indexes  and  varieties  of  them,  indexe 
of  bibliographic  references,  and  indexes  of  the  tabular  type. 

The  possibility  of  mechanization  of  almost  all  stages  of  the 
tecm,ologlcal  process  of  the  manufacture  of  machine  indexes  ensures 
the  operativeness  of  their  preparation,  v,'hich  is  especially 
important  If  indexes  are  used  as  signal  information.  The  reJuc  l  icr. 
of  expenditures  of  manual  labor  essentially  decreases  the  periods 
of  preparation  of  such  indexes. 

Preparation  of  bit liog'-aphic  indexes  for  current  and  retro- 
scective  retrieval  of  scientific  and  technical  literature  in  the 
practice  of  libraries  and  organs  of  scientific  and  technical  inform 
tion  is  labor-consutriing  work  requiring  large  expenditures  of  ti-ne. 
Automation  of  preparation  of  indexes  perrr.its  getting  current 
information  to  the  consumer  and  preparing  at  the  order  of  special;:- 
in  the  shortest  periods  retrospective  bibliographic  indexes  will; 
respect  to  various  branches  of  science  and  techmologj”. 
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This  work  exaTdnes  C'-'rtaiii  questions  of  the  r.ethod  of  creation 
cf  a  perinutatlonal  index,  exbracing  2500  sources  in  the  field  of 
lopnient  and  explc itat icn  of  naval  petioleum  and  gas  deposits 
h'c:.  1  "■  t  ti.''  j  90'o .  T!ie  indi;v  was  prepared  by  the  .Azerbayozhan 

Institute  of  Scientific  ai:id  Technical  Information  of  the  Gosplan  cf 
the  A zerbayazhan  Soviet  Socialist  Republic  in  conjunction  v.’ith  the 
All-Union  Institute  cf  Scientific  and  Technical  Information  of  the 
State  Conmittee  or.  Science  and  Technology  and  the  Acaaenty  of  Sciences 
of  tile  USSR  (Division  of  Semiotics,  leader  of  v.'ork  G.  E.  Vleduts)  . 

During  preparation  of  the  index  there  vras  used  the  "Ural-^" 
and  the  punch-card  equipment  in  the  system. 

This  index  includes  various  literary  sources:  journals, 
collections  of  scientific  and  technical  information,  bulletins, 
books,  articles,  and  other  materials,  which  illustrate  questions 
of  drilling  and  explcitaticn  of  naval  oil  vrells,  hydrotechnical 
construction,  cerresion  cf  equipment  of  nava].  petroleum  industries, 
economics,  and  others.  It  consists  cf  three  parts: 

1)  the  index  of  key  words  UKS  locate.,  '.n  alphabetical  order 
in  the  input  columui  in  app-rcxirriately  the  center  of  the  table,  and 
inf orriiation  codes  (rigiit  cclunun)  which  are  the  input  into  the 
other  two  parts j 

2)  the  bibliographical  part; 

3)  the  alphabet  lea]  index; 

On  the  left  and  on  the  right  cf  the  key  'word  In  the  line  there 
is  placed  the  context  of  the  title,  definitizing  its  meaning  (see 
tab  c)  . 
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Sample  Index  of  Key  Words  [in  Russian  alphabetical 
order.  Right  context  continues  at  left  margin], 

A 


opyxcHHfta 

OMHHfl  npH  noMouiH  CypOBoro 

NopcKtie  Ke<}iTaBiite  npouucBu 
MopcKix  ceAcMoptaBCAKa  ■ 

MopcxoR  aoiut**  BJiKiHne 

CRH«-nHeBMaT)n«Kaa  aaiuHTa 
MOpCKM 

e  TanicepoB—  npHNCHCHHc 
He  BJin  Mopcxoro  Cy^HHa  Ha 

BAH  — 

CXOM  OOpTy**  npHNCHCHHe 
JlyXTHBHOR  tOJItUH  BOCTO<IHOrO 

npcABapHTUbHoe  HaTcweMHc 
pOHCACHHft  SaHKH  AapBKKa,  o. 

BOHTypoB  He^TeHOCHOcrn  o. 

dSopa  Mopcxoft  Hc^rrH  b  nny 
KOMBpeccopauK  1— KBr  — 5  Hny 
eabMux  npHSopoB  or  MopcaoA 


rRAporeoMrinecKoro  peiKHua 
BHiu  paapatfoTKR  odierra  kc 
tfypeRHH  HaKJioHHua  cxaaiiKii 
4’JiosoA  pexHM  MKTopoxAeHHR 
saTOfuimHaH 
saToitaaewax 
MopcKaa  norpyxHaa 
flCABOAHoro  TpyfionpOBOAa  c 
ac^R  B  ceAHMCHTanHOHHux 


aBTOuaTHsaioiB  katoahoA  santHTU  uopcKiix  co' 
arperara  6eROTo-DpoH3BoacTBO  cbaArux  och 
arperaru  jii»  BABBjniaaHBH  ciaA^ 
aaepCaAAxaKB— 
aaepfiaAAxaHe— 

aaojiata  «a>  Ha  HeipTeuuuuBaxHUiie  CBoAcrsa 
aXBaTOpHH  MOpCXHX  HCIpTenpOHhlUOB  or  B0.1H 
aAMasHoe  6ypeHiie>- 

BAHMHHHeBwx  Tpy(S  Apti  aarpysKC  u  paarpysx 

aAXCKe->  OCUOBBH 

aHa6a3HH  HRrH6HTOp  khc-iothoR  xoppoanii  cr 
aHxepHux  caaA  npH  crpoHTe.ibCTBe  b  rauSypr 
anmepoHax  -fcBoAcTBa  BepxRcro  orae-ta  npo 
apMatypu  xMesofieroRRUx  caaAa> 
apTcMa  M  noprHHU  —  Hope~  +aoboA  pcxuM 
aecTO 

aprcMa—  o  pacnpc>AejieHHa 

apTCHRC^x  BOBBR  CXCMa 

apTetiKe4>Tb>iyAaBjiHBaMHe  nonyraoro  rasa 
aTMoc^pHoA  KopposHHosamitra  iiauepHr 

B 

(SaKHRcxoA  0yxni— o  Hexoropyx  Bonpocax 
(SaHxa  AapBHHa«>  ax 

fiaHxa  AapBHHa—  +ROBKa  OeaopHeHTHpoBaHHoro 
6aHKa  AapBMRa,  o.  aprcMa  h  noprxHU  —  Mope— 
■  (Sapxa  AJix  6ypeRRH  b  Mope— 

Oapxa— 

6apxH— 

fiapxR-  npouaAKa 

oacceAKax  aeMRoro  mapa-  +paaipo(TpaHeHMa 


UI2 

1333 

0830 

1272 

1036 

1339 

0778 

1333 

0823 

0804 

1418 

1379 

0785 

0148 

1313 

13i8 

1403 

1439 

1290 


1292 

1273 

1329 

1315 

1339 

1338 

0866 

1291 

0315 


During  preparation  of  the  index  there  were  carried  out  the 
following  operations:  separating  of  key  words,  composition  of  a 
list  of  nonkey  words,  and  editing  of  titles  of  materials.  In  the 
list  of  nonkey  words  ("empty  words")  we  include  conjunctions,  prepo 
sitions,  certain  forms  of  auxiliary  verbs,  certain  adjectives  and 
nouns  not  carrying  basic  semantic  load  in  the  text  of  titles,  and 
also  the  most  frep^uently  repeated  terms: 

Exemplary  li^t  of  words  excluded  from  index. 

Rules 
During 
Application 
Example 
Principles 
Causes 
Problem 
Production 


Ana lysis 

More 

Struggle 

Future 

Faster 

Probable 

Interaction 

Influence 


Quantitative 

Short 

Big 

People 

Measures 

Method 

Powerful 

Certain 
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Exemplary  list 

(Con'^inued)  . 

To  the  question 

New 

Process 

For  the  first 
time 

Guarantee 

Properties 

Gigantic 

Regions 

Conjunctions 

Data 

General 

State 

Action 

Definition 

Method 

Permissible 

Organization 

Periods 

Problems 

Peculiarities 

Point 

Regularity 

First 

Improvement 

View  (from  the 
point) 

Perspective 

Conditions 

Change 

Areas 

Section 

Study 

Preparation 

Factors 

Investigation 

Indices 

Characteristic 

Use 

Position 

Integers  (in) 

Results 

Therefore 

Stage 

Edited  titles  of  publications  were  recorded  on  punched  cards 
according  to  a  preliminarily  developed  mock-up. 

Separating  of  key  words  is  the  most  responsible  part  of  the 
work  during  preparation  of  a  permutation  index  and  usually  should 
be  fulfilled  by  highly  skilled  specialists.  The  quality  of  this 
work  to  a  large  extent  determines  the  quality  of  the  index. 

Furthermore j ,  if  the  index  is  reloaded  with  words  not  bearing  the 
basic  semantic  load,  then  retrieval  according  to  UKS  will  be  hampered, 
since  the  latter  will  strongly  swell. 

It  is  necessary  to  note  that  optimum  recording  density  of  a 
UKS  page  plays  a  large  role  in  evaluation  of  preparation  of  permuta¬ 
tion  indexes.  The  questiop  of  volume  of  index  is  especially  sharp 
during  processing  of  large  collections  of  publications.  In  this 
case  speed  of  preparation  of  permutation  indexes  in  combination  with 
the  economic  form  of  recording  is  an  advantage  of  such  indexes. 

Complexity  of  apportionment  of  key  words  is  caused  by  the  fact 
that  the  titles  of  publications  for  the  most  part  do  not  correspond 
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to  the  necessary  requirements j  i.e,,  as  a  rule^  are  multiword  and 
uninformative.  As  an  example  let  i;3  give  the  following  title: 

—  "Research  irf"  the  problem  of  origin  of  oil.  Hydrocarbons  in 
contemporary  deposits."  After  editing  the  title  took  such  a  form: 
"Origin  of  oil.  Hydrocarbons  in  contemporary  deposits." 

—  "Application  of  a  conveyer  on  a  barge  for  the  pouring  of 
dam."  After  editing  —  "Conveyer  on  barge  for  pouring  of  dam." 

In  some  cases  the  title  absolutely  does  not  disclose  the 
contents  of  publications  and  must  be  completely  replaced.  Into  the 
UKS  there  were  also  introduced  new  key  words  taken  from  text. 

The  process  of  apportionment  of  key  words  is  considerably 
simplified  in  the  presence  of  a  ready  list  of  nonkey  words  and  a 
thesaurus.  During  preparation  of  permutation  indexes  with  the  use 
of  thesauri  and  microthesauri  on  special  individual  questions  of 
science  and  technology  there  is  excluded  human  participation  in  the 
marking  of  titles,  and,  consequently,  there  is  removed  subjectivism 
in  the  selection  of  key  words,  inasmuch  as  in  a  thesaurus  there  are 
enveloped  all  basic  descriptors  with  respect  to  a  branch,  taking 
into  account  basic  gen^s-species  and  associative  relationships, 
and  others.  Standardization  of  terminology  will  also  allow 
eliminating  one  of  the  most  important  deficiencies  of  the  permutation 
index  —  scattering  of  information,  for  example  as  a  result  of 
cross-references. 

The  first  variant  of  index  obtained  on  the  tabulyagram  was 

thoroughly  edited.  In  connection  with  the  fact  that  the  list  "of 

empty  words"  was  composed  with  certain  assumptions,  in  index  there 

were  often  encountered  contexts  considerably  truncated  on  the  left 

and  on  the  right,  which  hampered  perception  of  the  title.  In  such 

cases  there  were  omitted  terms  which  in  the  first  editing  were  left 

as  key  terms  withcjilt  the  proper  base.  It  was  necessary  to  exclude 

from  the  UKS  the  title  of  source  which  started  from  the  dropped 

key  word,  and  conversely  to  Include  in  the  UKS  the  title,  starting 

with  the  new  key  word;  all  the  words  were  Introduced  in  alphabetical 
order. 
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in  the 


The  average  title  of  a  book,  journal,  article,  etc., 
index  contains  4  key  words  and  only  in  rare  cases  more  than 
that.  If  the  title  of  a  publication  has  5-4  key  words,  then  there 
is  room  for  it  in  the  index  for  the  most  part  wholly  without 
truncation. 

The  end  of  the  title,  if  it  fits  completely  on  one  line,  is 
marked  {■.■=);  however,  if  lrtie  title  does  not  go  on  one  line  the  length 
of  which  is  limited  (in  this  case  not  more  than  67  characters 
without  code),  then  the  context  is  partially  broken.  The  break  of 
context  is  marked  (+).^  The  remaining  part  of  it  is  considered 
accepted  and  is  kept  in  the  index  if  the  number  of  letters  of  the 
final  word  is  not  less  than  4. 

If  it  is  impossible  fully  to  decode  the  title  because  of  the 
truncation  of  the  context  one  should  find  the  second  key  word  and 

if  necessary  subsequent  key  words  of  this  title.  During  the  reading 
of  the  title  (in  each  individual  case  with  the  new  key  word)  it 

is  possible  to  restore  the  context  truncated  in  one  of  the  variants 
since  during  different  key  words  there  are  broken  various  words 
entering  context. 

The  index  starts  from  the  list  of  words'  excluded  from  tiie 
titles  of  sources  (so-called  "empty  words"). 

During  more  thorough  examination  and  editing  of  titles  it  would 
have  been  possible  to  separate  an  additional  series  of  nonkey  v;ords. 
However,  considering  the  well-known  labor  consumption  of  such  editing, 
and  also  the  fact  that  presence  in  the  index  of  an  insignificant 
number  of  these  words  will  not  elicit  special  inconveniences  during 
the  use  of  the  index,  certain  assvunptions  are  made. 

After  the  list  of  words  excluded  from  the  index  there  is  placed 


^There  are  given  signs  used  in  test  output  of  the  permutation 
index  on  development  and  exploitation  of  naval  petroleum  and  gas 
deposits  (360  publications),  collected  by  the  typographical  method. 
Full  output  of  the  index  has  other  signs. 
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the  ’’Index  of  Key  VJords."  Then  there  follows  a  'blhllographical 
list  of  sources,  in  which  the  authors  and  titles  of  all  sources 
and  others  are  shown. 


Sample  Bibliographical  Index. 

0001  Amctob  M.  K).,  AcpHBH  B.  A.  k  E«riip3ajie  K.  M.  HsroTOBJMHHe  ;ieT»jKA  Hera.!- 
■  aoxoHCTpyKnMft  onioB*HHft  non  MopcxHc  GvpoBue.  —  PbjL  Bucm.  yNeOx.  aaBaaeiiH-;). 
CcpHii  xHe<pTi^M  rB3*,  1958,  Si  8.  c.  106— '|I4. 

0010  Ajihmsmcaob  /I;  C.  06  MaMCHeHMH  BOJiMOBoro  ycHJiMfl,  AcflcTByioiuero  hi  cbbio 
BAOJib  npo<piuiB  BOAHu.  —  C6.  MopcKoc  Bo.iHeMiie  n  ero  BOJCAftcTBiie  na  nuporex- 
■mccKMe  coopyacexMB.  BiKy,  AaHHTH,  1965,  c.  109—118. 

0016  AHHp.neraHe  xjtnpK  ji.-  Pt%pt6omi  Mopocoro  HcipraMoro  MecropoacjrMH* 
Aapaye  b  HpaMe.—  <PeBn  neTp<M>.  1965,  1<^  Si  126,  c.  1—2  (aarjL) 

0033  AraaapoB  T.  K  Bonpocy  aar.iy&ieHKa  CBaft  c  noMoiauo  hoabcchiiIx  moiotob 
■  yCBOBMIX  CTpOKTeaiiCTBa  rAV60KOBOAHUX  MOpCXHX  HetpTCnpOMUC.IOBUX  aCTBKJX 

€Axp6aAAa(aHcicoe  HC^THMoe  xosbActbo*.  1959.  H  3.  c.  36 — 37. 

0061  TKMpHOB  B.  M.  K  aonpocy  o  sarnttre  Mopacax  Ne^TtnpoMuc.ioBHx  coopyaccMtiA 
aa  cayxiH  Apeft4)a  abaob.  «A3cp6aAAauiicxoe  He<^uoe  xosbActbo*.  195^  II. 
e.  24.  Ba&iiHorp.:  2  hisb. 

0113  EypoBaa  ycraROBKa  CrpoMAiu-It  r  hobuc  raaoBtse  MCcTopoXACHRa  b  nn- 
aocTORMOii  Texace.  — <OAji  bha  rac  a*.».  1955.  63.  >6  6.  c.  134—135.  Uar-i.) 

0142  BUAbCOH  XoaapA.  npHRCHCHHC  BCpTOJICTOB  H  MOpCKMX  CVAOB  flpR  6yp«1IHII  ■ 
Ko^HaHxecxRx  piSorax  a  vop«.  —  aOAji  sra  nc  A*.*.  1955.  63,  17,  c.  43—46. 

(aRra.). 

0148  B^kx  A.  r.  OOop^TAOBaHRe  A.'ia  BpeABapnejibBoro  RaTRxeaaR  apMarypu  me* 
jeaoOeroHHux  caaA.  «TpaHcnopTRoe  crpoim.'ifcCTBoa,  1961,  Nk  5,  c.  22—24 


On  the  left  tnere  is  a  column  of  numbers  indicating  the  code 
for  every  work.  The  index  is  completed  by  an  alphabetical  list  of 
authors . 


Sample  Author  Index  [in  Russian  alphabetical 
order] . 


AOAypaiflHAOB  C.  A. 

0784 

AOpaMoa  JX.  M. 

1420 

AracB  H. 

0776 

AraaapoB  T.  <P. 

0033 

AacKnepoBa  K).  A. 

1301 

AaHCB  r.  P. 

1302 

AaRiB  C.  1325 

1323 

AaHMaucAOB  A.  C. 

0010 

AaxHp  )K. 

1335 

AMCapuyMBH  A.  FI. 

0784 

Ahctob  M.  K). 

0001 

AMHp-neraKe-KaapH  A. 

OOlS 

AcarypaH  A.  Ill. 

1341 

AcpKXH  B.  A. 

0001 

Axmcaob  a.  a. 

1447 

AxyHAOB  A.  C. 

1401 

AmpatpoB  M.  P. 

0321 

Ba6aeB  H.  B. 

0239 

EarABcapoB  E.  fl. 

1439 

Barupsaae  K.  M. 

0001 

BaApaMoaa  M.  A. 

1232 

BaauHH  C.  A. 

1374 

1387 

Bayep  P. 

1332 

BerHapA  K.  H. 

0823 

BecxoM  B. 

0830 

Eecce  T.  FI. 

I3S0 

Eosmbh  H.  K. 

1275 

BoAcp  H.  P. 

1358 

y 


Boipcrpa  A.  K.  0831 

BeAAHep  H.  H. 

1358 

BeKTaopT-UIaabAC  E. 

1383 

BepTRHcAAep  H  . 

0770 

Baabcea  O. 

0842 

BitabcoH  X. 

0142 

BoaRK  A.  r. 

0148 

BopcHKCHn  3. 

0826 

raARCHeB  B.  A. 

1310 

TaabnepHH  A.  B. 

0758 

raHOapoB  K).  F. 

1306 

TacaMOB  A.  B. 

0275 

FacaHOB  A-  H. 

0276 

1359 

racBHOB  H.  r. 

1454 

raMKorreftH  E.  A. 

1437 

1404 

14ie 

r.iyxOBcxRx  B.  X. 

0292 

ToabAMaR  A. 

1339 

ra.-ibyMaH  B.  X. 

0302 

rpe«J)<j)e  3.  M. 

0322 

rpHTopaH  H.  A. 

0323 

ropAMep  0>.  A. 

0315 

rpoOniTefiu  C.  P.  ' 

0334 

rpaxjM  T. 

1338 

rysRK  H,  C 

0342 

ryceAMoBa  A.  A. 

1325 

ryceOHoaa  T.  M. 

1318 
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Retrieval  is  carried  out  in  the  following  way.  '  In  the  division 
"Index  of  Key  Words"  along  the  vertical  there  is  examined  the 
alphabetic  list  of  key  words  and  on  the  right  along  the  horizontal 
there  is  noted  the  infonrtation  cede  of  the  source  of  interest  to 
the  specialist.  From  the  Information  code,  which  is  the  input  of 
the  bibliographical  part  of  the  index  there  is  found  the  name  of 
the  source,  the  year  of  publication,  etc.  The  author's  index  serves 
also  for  transition  to  the  bibliographical  part  of  the  permutation 
index.  For  example,  there  is  given  the  title  coded  1'404  of  one 
and  the  same  source,  consisting  of  four  key  words:  "Electrochemical 
protection  of  hydrotechnical  constructions  from  corrosion,"  and 
diagram  of  transition  from  index  of  key  words  to  bibliographical 
and  author's  index  (Fig.  1,  see  above). 

So  that  it  is  more  convenient  to  use  the  index  and  in  order 
to  expand  the  aspects  of  retrieval  the  permutation  index  could 
additionally  have  alphabetically  coded  names  of  sources,  year  of 
publication,  and  others.  In  this  case  as  a  large  number  of  the 
most  diverse  forms  of  information  sources  were  used,  this  possibility 
is  not  used ,  Compilers  of  the  index  pursued  another  goal  —  on  a 
diagram  of  the  permutation  index,  simple  in  structure,  to  reveal  its 
essence . 

In  the  index  ther^  were  used  four-digit  codes  which  in  the 
process  of  machine  processing  in  all  cases  are  placed  in  the  UKS 
in  the  column  on  the  right.  So  that  it  is  more  convenient  to  pass 
from  the  index  of  key  words  to  the  bibliographical  part  of  the 
index  in  this  part  the  code  is  given  in  the  column  on  the  left.  In 
the  author's  index  the  code  is  used  not  for  retrieval  but  for 
transition' to  the  bibliographical  part;  therefore,  it  is  in 

the  column  on  the  right. 

Recordings  were  processed  on  the  "Ural-4"  according  to  a 
specially  composed  program:  reproduction  of  titles  according  to 
the  number  of  noted  key  words,  intramachine  sorting  of  reproduced 
titles  according  to  the  alphabet  of  key.  words  in  the  input  column 
of  the  UKS  (28th)  and  printing. 
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For  the  given  permutation  index  the  author’s  and  bibliographical 
parts  were  prepared  manually.  Their  processing' on  ETsVM  does  not 
present  any  principle  difficulties.  In  this  case  all  parts  —  UKS 
and  bibliographical  and  author's  index  —  after  machine  processing 
are  printed  separately;  at  the  output  of  the  ETsVM  we  have  three 
tabulyagrams  which  can  be  used  directly  for  subsequent  reproduction 
by  the  photo-offset  method. 

In  connection  with  the  fact  that  during  retrieval  of  sources 
by  this  index  there  is  used  visual  examination  of  titles^  the 
alphabetical  list  of  authors  or  the  bibliography,  much  attention 
should  be  paid  to  shaping  and  printing  the  index.  During  preparation 
of  the  test  output  of  the  permutation  index  according  to  the  develop¬ 
ment  and  exploitation  of  naval  petroleum  and  gas  deposits  there  was 
proposed  using  the  tabulyagram  for  direct  photographing  without 
reprinting  on  the  machine  and  subsequent  reproduction  by  the  photo¬ 
offset  method.  However,  in  connection  with  the  fact  that  there  was 
required  additional  editing  of  the  tabulyagram,  and  the  quality  of 
''Ural-4"  printing  was  lov;,  it  was  necessary  to  issue  the  index  by 
the  usual  typographic  method  (typesetting  on  linotype  and  printing 
on  a  flat -bed  machine) .  In  order  to  facilitate  visual  search  the 
part  of  the  context  to  the  left  of  the  list  of  key  words  and  the 
code  part  of  the  index  are  printed  on  a  colored  background. 

When  the  permutation  index  is  prepared  as  signal  Information, 
the  tabulyagram  should  nevertheless  be  used  directly  for  printing. 
During  composition  pf  the  index  on  literary  sources  after  a  con¬ 
siderable  interval  of  time  for  retrospective  retrieval  there  is 
Justified  use  of  traditional  forms  of  printing  (in  particular,  flat 
printing) .  It  is  true  that  periods  of  output  in  this  case  are 
somewhat  increased;  however,  the  permutation  index  Issued  by  such 
a  method  is  very  convenient  to  use.  Considering  that  one  of  the 
main  advantages  of  the  permutation  index  are  compressed  periods  of 
its  output,  subsequently  efforts  should  be  directed  towards  improve¬ 
ment  of  the  quality  of  printing  at  the  output  of  ETsVM,  and  intro¬ 
duction  of  ?J.l  possible  de.vices  and  att.'^chments  which  allow  improving 
the  quality  of  printing  of  tabulyagrams  and  ac coj-dii.rly  turning  to 
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pulDlication  of  permutation  indexes  and  for  retrospective  retrieval 
directly  from  the  tahulyagram  hy  the  offset  method. 

In  contrast  to  foreign  pennutation  indexes  on  every  source 
there  is  proposed  giving  ^n  index  of  universal  decimal  classification. 
This  will  allow  partially  removing  the  basic  deficiency  of  the  index, 
connected  with  incomplete  opening  of  the  contents  of  the  source 
according  to  the  title,  and  will  also  accelerate  the  process  of 
subsequent  delivery  of  the  source.  Furthermore,  considering  that 
the  most  widespread  form  of  Indexing  of  literary  sources  is  Universal 
Decimal  Classification,  the  permutation  index  prepared  with  indices 
of  this  classification  will  be  easy  to  grasp  by  specialists  and 
library  workers  and  will  be  widely  used. 

In  this  case  there  was  no  tendency  to  use  Universal  Decimal 
Classification  as  a  exploration  system.  However,  in  the  permutation 
index  there  is  a  real  possibility  of  using  Universal  Decimal  Classi¬ 
fication  for  these  purposes.  The  method  of  retrieval  according  to 
Universal  Decimal  Classification  and  questions  of  location  of  indices 
and  their  encoding  have  to  be  solved  in  the  process  of  preparation 
of  the  index,  taking  into  accoiint  creation  of  maximum  conveniences 
for  retrieval. 

The  main  advantage  of  permutation  indexes  is  t.ie  great  g'-in 
in  time  during  preparation  of  indexes  with  the  help  of  ETsVM.  This 
is  especially  noticeable  during  comparison  of  time  needed  for  their 
preparation  and  with  time  of  composition  of  subject  indexes  in 
library  practice. 

The  basic  time  during  composition  of  permutation  index  is 
spent  preparing  data  for  input  by 'external  devices.  In  particular, 
time  is  mostly  expended  punching  and  subsequently  checking  holes  on 
punch-card  equipment .  From  available  data  during  mechanized 
processing  of  material,  of  the  time  is  spent  p\anching  and 

checking.  At  the  same  time  during  application  of  ETsVM  labor  of 
punching  and  subsequently  checking  is  9^  of  the  total  labor  needed 
for  machine  data  processing.  It  is  sufficient  to  say  that  it  takes 


86 


about  17  minutes  to  prepare  a  permutation  index  of  500-400  titles 
in  the  presence  of  600  nonkey  words  on  the  IBM  7090. 

Creation  and  wide  use  of  permutation  indexes  will  allow  opera¬ 
tionally  in  more  compressed  periods  informing  consumers  of  informa¬ 
tion  about  the  latest  publications,  increasing  the  effectiveness 
of  current  bibliography  and  making  it  accessible  to  all  categories 
of  specialists. 

On  the  example  of  a  permutation  index  in  the  region  of  develop¬ 
ment  and  exploitation  of  naval  petroleum  and  gas  deposits  it  is 
possible  also  to  conclude  that  analogous  indexes  for  retrospective 
retrieval  of  literary  sources  is  highly  effective.  Creation  of 
such  Indexes  on  literary  sources  on  different  branches  of  science 
and  technology  for  defined  periods  of  time  will  render  priceless 
help  to  specialists. 


B.  ‘  .5  K  tgp-ra 
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Fig.  2.  Tabulyagram  UKS. 


87 


At  present  we  are  conducting  experiments  for  the  purpose  of 
further  improving  permutation  indexes. 

There  is  prepared  an  index  of  the  type  "KWOC"  with  the  use  of 
the  "Min'sk-22."  Experience  accumulated  in  the  course  of  the  develop¬ 
ments  described  permits  fpruulating  the  question  of  creation  of 
fundamentally  new  indexes  ba^-ed  on  the  indexes  examined  above, 
absent  in  foreign  and  domestic  practice. 
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MECHANIZED  IPS  FOR  A  VARIED  FUND  DEVELOPED 
AND  INTRODUCED  BY  THE  UPPER-VOLGA  TsBTI 

V.  M.  Mesnik 

At  the  Upper-Volga  TsBTI  [expansion  of  acronym  unknown]  in 
accordance  with  the  plan  of  development  of  the  national  economy 
of  the  RSFSR  from  1964  to  1965  there  was  conducted  scientific 
research  on  mechanization  of  processing,  retrieval,  and  reproduction 
of  scientific  and  technical  Information,  which  in  1965  was  Introduced 
into  the  work  of  the  TsBTI. 

Naturally,  the  ideal  information  service  cannot  be  Immediately 
created.  Up  to  now  no  such  service  exists  anjn-:’-'re  In  the  world. 
However,  in  any  case  with  application  of  mechanization  urgent 
questions  of  organization  of  'nformation  service  today  can  be  solved 
more  expediently,  profitably,  and  quickly  than  without  it. 

Under  conditions  of  a  regionally  varied  organ  of  information, 
only  domestic  punch-card  equipment  was  recognized  by  us  as  accessible 
and  expedient  for  realizing  the  problem  at  hand  (Penzenskiy  plant 
TEM  [expansion  of  acronym  ambiguous]). 

Reproduction  and  removal  copies  is  carried  out  on  domestic 
copiers  and  microfilming  equipment.  There  are  used: 

Punch  P8O-6 


89 


-  Verifier  K80-6j 

—  Sorter  s80-5Mj 

—  Installation  for  nlcroflliiilng  UDM-2; 

—  Electrographic  reproduction  apparatus  ERA-2F; 

—  Thermocopier j 

—  Microphotographer. 

As  Information  carrier  In  the  system  of  mechanized  retrieval 
ther?  Is  used  an  80-column  punched  card  "Glamekhscheta 

As  a  basis  of  the  retrieval  system  for  mechanized  retrieval 
under  conditions  of  a  varied  fund  we  assume  a  language  already 
created  --  universal.  International,  and  with  relatively  well-developed 
logical  links  between  Ideas,  i.e.,  a  language  of-  Universal  Decimal 
Classification. 

And  only  those  ideas  which  are  not  reflected  in  tables  of 
Universal  Decimal  Classification  are  coded  with  the  help  of  digital 
code  of  another  type,  which  is  created  in  the  division.  Brands  of 
machines  and  apparatuses,  their  parameters,  and  certain  other 
characteristics  will  be  thus  encoded.  Conditionally  this  code  is 
called  "descriptor,"  although  traditional  vise  of  this  term  has 
another  meaning. 

It  is  known  that  Universal  Decimal  Classification  is  assvimed 
as  a  basis  of  retrieval  language  in  systems  of  mechanized  retrieval 
successfxilly  used  in  Czechoslovakia,  Kun:;,ary,  and  other  countries. 

However,  use  of  Universal  Decimal  Classification  in  a  system 
of  mechanized  retrieval  involves  a  nvunber  of  difficulties:  1)  the 
same  questions  have  different  indices  if  they  belong  to  different 
regions  of  application;  2)  coding  of  complicated  questions  allows 
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arbitrary  arrangement  of  indices  corresponding  to  subjective  thinking 
of  indexers. 

All  of  this  in  a  system  of  mechanized  retrieval  must  lead  to 
serious  information  loss  and  a  high  percentage  of  information  "noise." 
Therefore,  using  Universal  Decimal  Classification  —  the  traditional 
system  of  classification  for  mechanized  retrieval  —  we  tried  to 
establish  strict  methodical  rules  obligatory  during  indexing  of 
materials  with  respect  to  four-five  aspects.  By  these  rules  every 
retrieval  criterion  of  a  document  is  coded  always  in  a  definite 
zone  of  a  punched  card.  Thus  there  is  established  a  single  diagram 
of  construction  of  a  complicated  and  compound  index  for  all  coders, 
and  during  composition  of  the  retrieval  program  it  is  always 
absolutely  obvious  in  what  zone  a  given  retrieval  criterion  is  coded 
(See  Appendix)  . 

A  code  for  mechanized  retrieval  includes: 

1.  An  index  of  Universal  Decimal  Classification,  complete  and 
exact,  with  use  of  all  possibilities  of  the  system. 

2.  A  "descriptor"  which  expresses  I  rands  of  equipment  and 
their  parameters,  l.e.,  those  characteristics  not  in  Universal 
Decimal  Classification. 

3.  Year  of  publication  of  information. 

Code  for  mechanized  retrieval  takes  up  33  columns  on  an  80-column 
card.  The  remaining  part,  in  which  it  is  possible  to  dispose  1000 
printed  characters,  is  used  for  recording  the  text  of  the  abstract. 

The  field  for  punching  code  is  divided  into  five  zones,  in  each  of 
which  there  is  always  recorded  a  strictly  defined  part  of  the  code. 

1)  I  zone  —  12  columns.  There  is  coded  technological  process 
and  equipment  —  all  these  ideas  are  expressed  by  the  basic  index 
of  Universal  Decimal  Classification  and  special  determinants  of 
type  0; 
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2)  H  zone  —6  columns  (15-18);  definitlzlng  characteristics 
(questj.ons  of  checking,  regulation,  and  machine  parts)  are  expressed 
through  determinants  of  type  00  and  of  type  -  (hyphen); 

5)  III  zcne  ~  9  columns  (19,227):  object  of  processing,  region 
of  application,  material  —  all  of  this  is  expressed  by  the  index 
of  Universal  Decimal  Classification  after  the  ratio  sign  (:); 

::4y  IV  zone  —  5  columns  (28-52)  brands  of  machines  and  apparatuses, 
parameters,  and  other  characteristics  not  in  Universal  Decimal 
Classification.  (See  mock-up  of  punched  card  in  Fig,  1.) 
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Such  division  of  code  into  zones  in  which  each  part  of  it  is 
always  punched  in  certain  columns  of  the  punched  card,  creates 
important  advantages  during  multiaspect  retrieval,  allowing 
inquiries  both  general  and  special  in  nature  to  be  answered. 


If  the  inquiry  is  received  to  , issue  materials  on  welding 
apparatuses,  it  is  possible  to  examine  the  whole  array  on  welding 
only  with  respect  to  the  1st  zone  621.791.75.03. 


If  the  user  is  interested  in  information  on  automatic  welding 
apparatuses,  the  Ist  and  2nd  zones  of  the  same  array. 
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In  order  to  obtain  Information  on  the  make  of  equipment  it  is 
sufficient  to. examine  only  the  rescriptor  zone  of  the  corresponding 
array. 

Let  us  examine  the  method  of  composition  of  the  descriptor 
dictionary: 

Classifying  information  is  material  in  which  the  make  of  equip¬ 
ment  is  shown,  the  coder  records  it  in  the  descriptor  dictionary. 

In  this  dictionary  every  letter  is  designated  by  corresponding 
figure  A  —  01,  B  —  02...K  —  10,  P  —  15,  etc.  The  coded  idea  is 
recorded  by  initial  letter  or  figure  "Welding  automation"  ASID-5M 
on  "A,"  indicator  instrument  PKSh5  on  P. 

Digital  expression  of  the  descriptor  consists  of  the  reference 
number  of  its  initial  letter  and  the  reference  number  of  the  given 
recording  on  this  letter.  Thus,  ASID-5M  (make  of  welding  automatic 
machine)  was  coded  0116,  where  01  is  the  designation  of  letter  "a,” 
and  16  is  the  reference  number  of  recording.  AT-4-120-1  is  coded 
01.29  according  to'  the  same  principle. 

When  the  corresponding  recording  is  ma,de  in  the  descriptor 
dictionary  and  the  make  of  equipment  is  digitally  coded,  there  is 
ordered  a  card  on  which  there  is  designated  the  alphabetical  or 
niv.r.9rical  express of  th-:;  raake,  uhe  name  of  the  equipment,  the 
code  appropriated  to  it,  and  the  index  of  Universal  Decimal 
Classification.  This  card  is  placed  in  the  card  file  of  the 
descriptors  in  strictly  alphabetical  order  by  name  of  make,  without 
taking  the  digital  expression  of  the  code  into  consideration. 

During  composition  of  the  retrieval  program  there  is  used 
already  only  this  card  file  (and  not  a  dictionary),  since  from  it 
it  is  easy  to  find  the  make  of  any  equipment  information  about  which 
is  in  the  fund  and  the  code  appropriated  to  it  and  from  the  code 
during  several  minutes  of  sorting  the  information  is  detected. 

The  index  of  Universal  Decimal  Classification  entered  on  the 
card  indicates  what  division  should  be  examined  on  the  sorter. 


Retrieval  time  on  the  sorter  is  2-5  minutes.  Today  this  fully 
organizes  us,  the  more  so  because  there  is  no  longer  a  need  for 
manual  sorting  and  arrangement  of  maps  —  lab or -consuming  and  thank¬ 
less  work. 

Another  method  of  preventing  subjective  approach  of  the  indexer 
to  selection  of  key  words  and  a  method  of  their  expression  by 
Universal  Decimal  Classification  is  a  card  file  of  decisions  in  which 
there  are  fixed  all  decisions  made  while  documents  are  being  processed. 
A  sample  card  from  the  card'  file  of  decisions  is  shown  below. 

Casting  under  pressure  621.74.045 

—  created  compressed  air  or  gas  621.74.045.3 

—  equipment  621.74.045.5.06 

—  pressing  pneiimatlc  cylinder  621.7^.0^3.3.06'-222 

—  created  mechanically  621.74,045.2 

—  assembly  line  621.74,045.2:658.527 

/ 

As  the  basis  of  this  alphabetical-subject  card  file  there  is 
assumed  an  index.  However,  even  in  this  question  there  is  deviation 
from  traditional  library  procedures:  in  the  card  file  there  is 
recorded  the  key  word  encountered  during  indexing  of  the  document  — 
the  procedure  utilized  in  systems  of  the  descriptor  type  —  but  in 
contrast  to  these  systems  to  the  given  word  there  is  appropriated 
ready  international  digital  code  —  the  index  of  Universal  Decimal 
Classification  having  its  own  definite  place  in  the  general 
hierarchical  system  of  classification  of  technical  ideas  —  and, 
therefore,  is  connected  and  with  a  more  general  technical  idea  and 
with  its  narrow  questions. 

I 

Welding  apparatus  —  621.791.75,05,  where  621.791  is  welding, 
and  621.791.75  is  electric  arc  welding. 


94 


On  the  other  hand,  with  such  a  method  of  management  the  card 
file  of  decisions  acquires  an  orienting  fimction:  presence  in 
card  file  of  decisions  of  the  card  with  the  key  word  is  evidence 
that  the  corresponding  document  is  in  the  fund. 

Management  of  this  card  file  is  labor-consuming  work,  hut  it 
justifies  Itself,  all  the  more  so  since  rates  of  growth  of  the  fund 
and  this  card  file  are  incommensurahle,  to  which  Fig.  2  testifies. 


When  in  the  card  file  there  is  reflected  all  or  almost  all  the 
terminology  of  the  given  branch,  the  process  of  indexation  will  be 
essentially  facilitated. 

We  experimentally  checked  a  system  having  ah  array  of  10 
thousand  punched  cards.  Very  reassuring  results  are  obtained. 

A  table  of  these  results  is  given  below. 
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Zero  ^  noise  —  in  other  words,  among  information  materials 
Issued  by  sorting  there  was  not  one  superfluous  document  not 
answering  the  given  question. 

The  percentage  of  unissued  information  from  what  is  available 
in  the  fund  on  a  given  question  of  information  was  10.8.  In  the 
theory  and  practice  of  scientific  and  technical  information  this 
index  is  considered  very  good. 

Of  the  16  punched  cards  unissued  12  were  not  issued  because 
of  the  imperfect  construction  of  the  sorting.  Sampling  on  a  many¬ 
valued  criterion  is  carried  out  by  commutation  of  sockets  of,  the 
first  and  second  row  of  the  panel  "of  set"  with  the  help  of  switching 
cords,  which  not  always  ensures  reliable  contact  and  does  not  give 
required  speed.  And  from  this  there  is  increased  the  percentage 
of  noise  and  the  percentage  of  unissued  information. 

For  sampling  of  information  on  a  many-valued  criterion  a 
push-button  panel  of  set  is  necessary.  The  deficiency  of  sorting 
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Is  the  fact  that  it  samples  on  12  columns  In  one  run.  And  for 
retrieval  of  information  there  is  necessary  an  electronic  sorter 
to  carry  out  sampling  immediately  on  all  columns. 

Usually  the  following  arguments  are  made  against  use  of  Universal 
Decimal  Classification  for  mechanized  retrieval: 

a)  technical  difficulties  of  indexation; 

b)  high  cost  of  input; 

c)  impossibility  of  expressing  the  contents  of  the  document 
in  detail; 

d)  subjectivism  of  Indexer. 

How  we  minimize  the  subjectivism  of  the  indexer,  using  Universal 
Decimal  Classification,  has  already  been  discussed  above. 

Indexation  by  Universal  Decimal  Classification  is  a  process 
requiring  tense  attention  and  definite  knowledge,  like  any  intellectual 
labor  in  general.  And  all  engineers  of  average  skill  master  it  in  a 
comparatively  short  period  and  index  materials  with  sufficient  com¬ 
pleteness  and  accuracy  and  comparatively  rapidly,  averaging  25 
deeumente  in  a  day, 

In  this  connection  it  is  appropriate  to  talk  about  cost  of 
input,  more  exactly,  the  cost  of  coding  one  document.  It  costs  us 
15-16  kopecks.  All  remaining  consiimptions  on  input  are  not  related 

to  the  retrieval  language. 

y 

If  one  considers  that  regional  reference  and  Information  funds 
are  In  considerable  measure  completed  with  cards  of  central  branch 
institutes,  which  proceed  to  us  with  indices  of  Universal  Decimal 
Classification,  then  for  laying  of  them  in  the  fund  for  mechanized 
retrieval  it  is  necessary  only  to  convert  the  index  into  code,  and 
this  takes  minutes,  of  course,  if  in  central  branch  institutes  it 
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pertains  to  Indexation  with  sufficient  conscientiousness.  Now, 
unfortunately,  this  is  not  observed. 

Regarding  the  degree  of  fullness  and  accuracy  of  coding, 
although  in  principle  systems  of  the  descriptor  type  in  this  respect 
have  to  have  evident  advantages  over  a  system  of  the  classification 
type,  (Universal  Decimal  Classification)  in  those  systems  with  which 
it  was  possible  for  us  to  become  acquainted,  there  is  not  yet  any 
such  advantage:  in  them  lexical  composition  is  too  poor,  and  logical 
couplings  are  almost  not  developed.  Moreover  all  of  them  are  created 
for  service  of  the  fund  on  a  more  or  less  narrow  question.  For  a 
regional  fvind  a  universal  system  is  needed,  among  which  Universal 
Decimal  Classification  is  undoubtedly  best. 

What  are  the  methods  of  processing,  storage,  retrieval,  and 
delivery  of  Information  used  in  SIF  TsBTI? 

In  the  fund  of  SIF  there  is  put  information  both  proceeding 
"from  the  bottom  upwards"  and  obtained  "from  the  top  downwards," 

According  to  materials  of  Information  cards  of  enterprises  of 
three  regions  workers  of  branch  divisions  and  SIF  TsBTI  monthly 
compose  lists  of  proceeding  information  cards.  They  are  printed 
on  rotoprlnt  in  the  form  of  pamphlets  with  the  possibility  of  cutting 
them  into  separate  cards  for  convenient  laying  in  the  fund. 

Information  is  printed  on  the  "Optima"  (IGV-2)  typewriter  and 
carries  the  following  Information: 

a)  name  of  innovation, 

b)  by  whom  developed, 

c)  date  and  place  cjf  introduction,. 

d)  source  of  Information, 


e)  annotation, 

f)  economic  effect. 

After  reproduction  of  the  hits  on  printer-copier  equipment 
and  distribution  of  them  to  the  enterprises  of  the  economic  region, 
the  first  copy  of  it  is  cut  and  transferred  on  the  ERA-2F  to  p^lnched 
cards . 

In  a  shift  one  ERA-2F  transfers  information  to  500  pvinched  cards. 
On  information  materials  entering  the  TsBTI  from  other  TsBTI  and 
central  branch  Institutes,  there  is  placed  in  the  fund  a  punched 
card  with  a  bibliographic  description. 

The  cost  of  one  punched  card  with  transferred  text  of  information 
is  1.8  kopecks. 

Punched  cards  with  transferred  information  are  punched. 

Punched  cards  in  the  receiving  pocket  of  the  P8O-6  punch  are  filled, 
and  the  operator  punches  the  assigned  code,  • 

The  productivity  of  the  punch  is  I6OO  pvinched  cards  per  shift. 

The  punched  cards  are  checked. 

They  are  checked  on  the  K8O-6  verifier  on  the  keyboard  of  which 
there  is  a  second  time  collected  code  according  to  a  mock-up  of 
punching. 

Correctness  of  punching  governs  the  result  of  subsequent 
retrieval  of  Inquired  information. 

The  productivity  of  the  verifier  is  I6OO  punched  cards  per 
shift.  We  do  not  use  it  completely.  Information  in  the  fund  is 
retrieved  in  such  order;  in  accordance  with  the  subject  of  the 
inquiry  there  is  composed  a  program  for  mechanized  retrieval  with 
respect  to  the  card  file  of  decisions  and  the  card  file  of  descriptors. 

y 
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The  sorter  examines  the  corresponding  array  of  punched  cards, 
doc\iments  issued  by  it  are  analyzed  by  an  engineer,  and  then  copier- 
reproducer  or  microfilming  equipment  malces  copies  in  2-J>  min,  which 
are  sent  to  the  user. 


Operational  Communication 

For  operational  us^  of  the  latest  achievements  of  science  and 
technology  there  are  necessary  not  only  mechanization  of  preparation, 
retrieval,  and  delivery  of  information,  but  also  its  fast  trans¬ 
mission  to  users. 

According  to  measures  provided  for  by  the  resolution  of  the 
Council  of  Ministers  of  the  USSR  from  1  August  I965  No.  845,  organs 
of  information  are  obliged  to  organize  the  obtaining  and  operational 
transmission  of  information  with  the  use  of  the  latest  instruments 
and  equipment  of  communication. 

In  the  Upper-Volga  TsBTI  since  March  1964  there  has  been  teletype 
communication  between  division  SIP  and  DNTIP  in  the  cities  of 
Yaroslavl,  Vladimir,  and  Kostroma.  There  has  been  e'tablished 
communication  with  many  enterprises  of  economic  regL.^ns. 

At  present  there  has  been  established  teletype  communication 
with  many  republic  institutes  and  TsBTI  of  many  economic  regions 
of  the  coiintry.  The  teletype  comm'anication  permits  considerably 
shortening  the  time  it  takes  to  obtain  and  deliver  information. 

If  the  time  it  takes  to  obtain  information  (Inquiry  —  answer) 
by  letter  is  IO-15  days  at  best,  then  with  the  help  of  teletype 
communication  it  is  shortened  to  2-5  h  (and  in  certain  cases  to 
20-50  min) . 

In  1965  510  references  were  obtained  and  Issued  by  teletype. 
Subsequently  there  is  contemplated  still  wider  use  of  teletype 
communication  for  the  needs  of  Information. 


/ 
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Conclusion 


I.  The  IPS  developed  by  us  is  applied  in  practical  activity 
of  varied  SIP  during  two  years,  / 

Mechanized  retrieval  and  delivery  of  information  with  the  use 
of  comparatively  cheap  punch-card  equipment  considerably  reduces 
time  necessary  for  retrieval  of  documents  and  facilitates  the  labor 

of  workers  of  the  section,. 

And  althougl^^  retrieval  is  carried  out  according  to-  a  limited 
•number  n-P  retrieval  criteria  (‘'^-5)  practically  in  the  overwhelming 
m£ jorlty  of  cases  we  are  satisfied  with  its  results. 
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APPENDIX 

TEMPORARY  INSTRUCTION  ON  SEPARATE  QUESTIONS  OF  INDEXATION 
BY  UNIVERSAL  DECIMAL  CLASSIFICATION  AND  CODING  OF 
INFORMATION  MATERIALS  FOR  MECHANIZED 
y  RETRIEVAL 

During  coding  of  information  materials  one  should  observe  the 
following  rules: 

1,  The  basic  index  of  Universal  Decimal  Classification  disposed 
in  the  1st  zone  should  express  the  technological  process  or  equipment 
of  which  it  is  a  question  in  the  information,  and  the  part  of  the 
index  which  comes  after  the  sign  of  ratio  (i.e,,  in  the  5rd  zone) 
determines  the  object  of  processing. 

For  example:  1.  Dyeing  of  tissues  made  of  synthetic  fibers 

677.842:677.^94.064. 

2»  Milling  machine  for  worm  cutting  of  straight -tooth  bevel 
gears  621.914.5:621.835.22. 

Only  in  those  cases  in  which  it  is  impossible  to  express  the 
technological  process  by  the  basic  index  of  Universal  Decimal 
Classification  is  the  object  of  production  placed  in  the  first  zone, 
and  the  process  and  equipment  designated  by  determinant  .002... 

For  example:  1.  Zapletka  [no  translation  found]  of  cables  in 
production  of  truck-cranes  621.86.065.5.002.72. 
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2.  If  the  contents  of  the  information  can  he  expressed  only 
with  the  use  of  two  signs  of  ratio, ’ one  should  give  dipole  cards 
(the  first  idea  with  respect  to  the  second  and  the  first  idea  with 
respect  to  the  third).  For  example:  "Sharpening"  of  a  drill  with 
an  artificial  diamond  621.925.6 :621.951.4 :666 .235. 

code:  1)  621.923.6^^621.951.4 . 6 

2)  621.925.6^^666.253 . 6 

3.  In  a  number  of  cases  very  important  technical  ideas  on 
Universal  Decimal  Classification  can  be  expressed  only  through  a 
determinant  of  type  .0. 
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QUESTIONS  OF  DEVELOPMENT  OP  AN  INFORMATION-RETRIEVAL 
SYSTEM  FOR  AUTOMATED  CONTROL  OF  THE  WORK  OF  A 
NAVAL  FLEET  OF  STEAM  NAVIGATION 

A.  N.  Kiselev  and  I.  A.  Mikhaylova 

/ 

Further  improvement  of  the  control  system  of  a  naval  fleet  of 
steam  navigation  is  possihle  only  on  the  basis  of  introduction  of 
mathematical  methods  and  means  of  computer  technology  and  a  way  of 
gradual  transition  to  a  system  of  automated  control  (SAU) .  Such 
a  system  can  take  over  the  following  basic  functions : 

a)  automated  collection^  treatment  and  storage  of  information 
about  state  of  transport  process; 

b)  sizing  up  of  the  situation  put  together  in  the  transport 
process  and  development  of  recommendations  on  its  regulation; 

c)  machine  preparation  of  Initial  data  and  solution  of  concrete 
problems  of  control  of  the  work  of  the  fleet; 

d)  preparation  and  delivery  of  answers  to  Inquiries  of  management 
apparatus  and  reports  on  set  forms,  etc. 

Naturally,  the  SAU  will  be  created  in  several  stages,  distin¬ 
guished  by  a  circle  of  solved  problems,  degree  of  automation  input 
and  data  processing,  structure  of  controls,  etc. 
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The  material  of  the  present  report  is  part  of  large  overall 
research  in  information  provision  of  SAU.  This  complex  also 
establishes  the  composition  of  information  participating  in  solution 
of  the  problems  of  control  of  the  work  of  the  fleet;  its  standardiza¬ 
tion  and  unification,  development  of  extramachine  and  intramachine 
languages  and  forms  of  presentation  of  information,  organization 
of  transmission  of  information  on  different  communication  channels, 
development  of  founded  requirements  for  composition  of  technical 
means,  and  provision  of  the  necessary  sequence  and  continuity  of 
treatment  and  transmission  of  Information. 

The  information-retrieval  system  (IPS)  for  automated  control 
of  the  work  of  the  fleet  is  developed  in  the  following  sequence: 

1.  Analysis  of  the  complex  of  problems  of  control,  their 
information  communications,  and  peculiarities  of  solution. 

2.  Foundation  of  SAU  in  the  form  of  an  IPS  variant. 

Peculiarities  caused  by  assignment  and  conditions  of  work  of  the 
system.  / 

3.  Development  of  a  system  of  inquiries  and  IPS  (composition 
of  current  forms,  inquiries  of  management  apparatus,  preparation  of 
initial  data  for  solution  of  concrete  problems  of  control)  . 

4.  Classification  of  objects  and  their  characteristics. 
Determination  of  composition  and  volume  of  arrays  of  information 
(information  tables)  . 

5.  Development  of  principles  of  organization  of  storage  and 
restoration  of  information. 

6.  Development  of  method  of  distribution  of  information  in 
memory  unit  (ZU)  .- 

7.  Development  of  algorithms  and  programs  of  input  and 
restoration  of  information. 
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8.  Development  of  algorithms  and  programs  of  preparation  of 
initial  data  of  solution  of  problems  of  control  of  the  work  of  the 
fleet . 

9.  Development  of  algorithms  and  programs  of  answers  to 
Inquiries . 

The  economicomathematical  model  of  the  system  of  automated 
control  of  the  work  of  a  fleet  of  steam  navigation  has  more  than 
80  different  problemsj  which  can  be  conditionally  divided  into 
three  basic  groups. 

The  first  group  includes  annual  and  quarterly  planning  of  work 
of  fleet,  loading  of  ports  and  ship  repair,  composition  of  the 
optimum  diagram  of  travel  and  arrangement  of  vessels  by  lines  and 
directions,  etc. 

The  second  group  consists  of  problem  of  operational  planning 
and  regulation  of  the  work  of  the  fleet,  development  of  a  monthly 
work  schedule  of  vessels,  scheduled  assignment  to  a  concrete  vessel, 
appraisal  of  the  dally  situation  and  submitting  recommendations  on 
readdressing  of  vessels,  etc, 

x 

In  the  third  group  there  are  problems  of  operational  and 
statistical  calculation  and  analysis  of  the  work  of  the  fleet. 

For  solution  of  a  large  complex  of  problems  on  control  of  the 
work  of  the  fleet  it  is  necessary  to  process  and  store  in  EVTsM  a 
large  volume  of  information;  plans  of  works,  data  about  the  current 
state  of  the  transport  process,  position  of  vessels,  state  of  ports, 
presence  of  cargoes,  technical  characteristics  of  vessels  and  ports, 
tariffs  on  transport,  norm  of  treatment  of  vessels  in  ports,  data 
about  completed  trips,  etc. 

Analysis  of  information  participating  in  the  solution  of  the 
complex  of  problems  of  automation  of  control  of  the  work  of  the 
fleet  bears  witness  to  their  large  information  commiinicatlon.  In  a 
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majority  of  problems  the  same  initial  intermediate  data  are  used, 
and  the  results  of  one  problem  are  source  data  for  solution  of  other 
problems,  etc.  Thus,  for  example,  in  problems  of  composition  of 
a  chart  of  work  of  vessels,  scheduled  assignment,  composition  of  a 
cargo  plan,  and  readdressing  of  vessels  there  participate  such  data 
as  load  capacity,  load-carrying  capacity,  speed  of  movement  of  vessel, 
distance  between  ports,  tariffs  on  transport  of  cargoes,  etc. 

Experience  in  solving  Individual  problems  shows  that  manual 
preparation  of  initial  data  for  solution  of  problems  of  control, 
their  retrieval  in  documents,  distribution  in  a  definite  sequence, 
punching  and  input  takes  90-95?^  of  the  time  spent  solving  the 
problem. 

Part  of  the  problems  intended  for  daily  solution,  have  limita¬ 
tion  with  respect  to  time  of  solution.  The  first  of  these  is  the 
problem  of  regulation  of  the  work  of  the  fleet,  which  must  be  solved 
in  a  few  hours.  Furthermore,  daily  there  will  be  solved  the  problem 
of  composition  of  scheduled  assignment  for  vessels  finishing  a  trip, 
composition  of  a  cargo  plan  of  vessels  before  loading,  delivery  of 
data  on  operational  account,  and  delivery  of  reference  data  on  the 
state  of  transport  process  in  different  aspects.  Manual  preparation 
of  Initial  data  for  solution  of  these  problems  requires  large 
expenditures  of  working  time  of  technical  personnel  and  loads 
auxiliary  equipment .  ' 

The  biggest  effect  in  this  case  can  be  obtained  if  the  system 
of  information  is  formed  taking  into  account  interaction  of  problems 
of  Information  flows. 

During  the  analysis  of  the  complex  of  problems  of  control, 
besides  the  establishing  of  Information  links  between  them  and  the 
conditions  of  their  solution,  there  is  determined  the  composition 
of  the  necessary  information.  For  every  element  of  information 
(index)  there  are  indicated  corresponding  characteristics  (unit, 
accuracy  of  measurement,  maximum  value,  periodicity  of  changes, 
etc . ) . 
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storage  of  Information  and  its  distribution  in  the  memory- 
units  of  the  computer  can  be  organized  in  two  ways. 

In  the  first  case  immediately  after  input  special  arrays  are 
formed  each  of  which  is  useful  for  solution  of  a  certain  problem. 
Creation  of  such  arrays  is  acceptable  if  the  complex  of  problems  is 
known  in  advance  and  small.  If  the  complex  of  problems  is  large, 
there  will  be  a  greater  number  of  arrays,  where  the  same  data  can 
appear  in  a  number  of  arrays,  which  creates  certain  difficulties. 

At  present  certain  problems  use  small  volume  of  data  and  are  solved 
for  a  concrete  group  of  vessels  or  individual  vessels,  for  example, 
the  problem  of  composition  of  scheduled  assignment  and  the  composition 
of  the  cargo  plan  of  the  vessel.  Big  steamship  companies  unite 
hundreds  of  vessels;  therefore,  the  composition  of  arrays  of  informa¬ 
tion  for  every  problem  and  in  reference  to  concrete  vessels  and 
ports  is  bulky  and  irrational. 

A  second,  more  rational  method  is  presentation  of  SAU  in  the 
form  of  an  information-retrieval  system  and  organization  of  informa¬ 
tion  in  the  form  of  information  tables. 

During  foundation  of  SAU  in  the  form  of  an  information-retrieval 
system  there  were  used  theoretical  developments  of  N.  A.  Krinitskly, 

O'.  A.  Abramov,  and  others  [1,  2,  3]. 

Information  systems  describe  the  state  of  certain  objects  in 
their  interconnection.  An  information  system  is  a  set  of  sources 
and  a  depository  of  Information,  a  collection  of  algorithms  of 
selection  of  InformatlOT,  and  an  information-carrying  system. 

An  information  control  system  of  a  fleet  can  be  considered  a 
particular  case  of  automated  systems  possessing  a  number  of  specific 
properties  governed  by  the  purpose  and  conditions  of  its  work.  The 
system  of  information  provision  of  SAU  possesses  criteria  of  a 
dynamic  information-retrieval  system; 

1)  restoration  of  information  about  the  state  of  objects 
participating  in  the  transport  process  and  conditions  of  shipments; 


2)  distribution  of  information  according  to  a  definite  law; 

3)  preparation  of  initial  data  for  solution  of  problems  of 
control; 

4)  composition  of  answers  to  inquiries. 

The  first  basic  peculiarity  of  this  IPS  is  limited  composition 
of  information  elements  in  it.  The  main  elements  of  the  transport 
process  are  vessels,  ports,  and  loads.  Knowing  the  perspective  of 
development  of  the  fleet,  it  is  possible  to  count  the  assigned  number 
of  vessels  of  steam  navigation  in  a  certain  interval  of  time,  for 
example,  a  year.  The  composition  of  world  ports  changes  more  and 
more  rarely.  The  product-list  of  transported  cargoes  also  changes 
rarely. 

A  majority  of  inquiries  to  the  information  system  are  known 
beforehand  -and  can  be  rigidly  programmed.  The  obtaining  of  an 
answer  is  the  delivery  of  information  according  to  a  standard  program. 
There  is  no  need  in  this  case  to  translate  the  inquiry  from  input 
language  into  Information  language  since  information  at  the  input 
to  the  system  will  be  basically  presented  in  standardized  language. 
Information  should  be  Issued  in  convenient  human-readable  form, 
for  example,  in  standardized  Russian. 

The  information-retrieval  system  will  conduct  a  calculation  of 
the  work  of  the  fleet,  both  operational  and  statistical,  practically 
without  documentation,  having  freed  management  from  unnecessary 
tiresome  work  on  data  processing  and  composition  of  various  reports 
and  references. 

Therefore,  an  important  question  during  development  of  IPS  is 
analysis  of  the  work  of  management  apparatus  of  steam  navigation 
and  problems  of  control  and  investigation  of  inquiries. 

To  such  inquiries  there  can  be  referred  delivery  of  Information 
about  the  position  of  vessels  of  steam  navigation  at  a  definite 
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moment  of  time,  the  presence  of  vessels  In  a  certain  region,  daily 
or  monthly  transported  loads,  fulfillment  of  the  scheduled  assign¬ 
ment  by  a  vessel,  the  btfdget  of  time  of  vessels  in  a  calendar  cut, 
etc . 


Every  object  of  the  control  system  can  be  described  by  a  set 
of  characteristics  X^,  ...  X^^^.  Moreover  in  the  Information 

system  there  are  stored  concrete  values  of  these  characteristics. 

The  characteristics  of  an  object  are  not  only  numerical  values  of 
any  parameter  of  the  object,  but  also  other  data  determining  the 
participation  of  this  element  in  the  transport'  process.  In  this 
case  the  concrete  value  of  characteristics  can  be  represented  not 
only  by  a  numerical  value,  but  can  also  have  the  form  of  a  certain 
alphabetical  equivalent,  v7ord  or  even  word  combination.  Information 
about  the  objects  of  the  system  is  in  the  form  of  information 
tables.  Objects  are  unified  into  information  tables,  taking  into 
account  convenience  of  retrieval  of  data  during  solution  of  a 
problem. 

Information  systems  are  created  for  production  of  information 
about  a  certain  set  of  objects  M.  Every  object  should  be  assigned 
a  list  of  properties,  A  set  of  M  objects  is  divided  into  several 
nonintersecting  classes.  Every  class  can  also  be  split  into  sub¬ 
classes,  sub-subclasses,  etc. 

Objects  of  every  class  and  their  descriptions  are  simultaneously 
numbered  in  such  a  way  that  objects  of  zero  rank  are  numbered  first 
(lowest  degree  of  classification),  then  of  the  first  rank,  etc. 

In  our  case  we  are  dealing  with  a  small  number  of  types  of 
objects  (vessels,  ports,  cargoes,  'clientele,  adjacent  forms  of 
transport,  etc.). 

One  of  the  labor-consuming  problems  in  this  plan  is  classifi¬ 
cation  of  objects  (elements)  of  the  information  system  and  also  the 
establishing  of  characteristics  of  objects  and  their  classification. 
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Objects  and  characteristics  riust  be  classified  proceeding  from 
conveniences  of  solution  of  problems,  and  retrieval  of  necessary 
information.  The  set  containing  the  greatest  number  of  characteris¬ 
tics  is  the  set  of  information  about  vessels.  Vessels,  which  will 
be  processed  by  the  ports  of  a  given  steamship  company,  i.e.i  enter 
the  sphere  of  its ‘^control,  will  be  both  domestic  and  foreign.  In 
turn,  every  group  is  divided  into  cargo,  cargo -passenger,  and 
passenger  vessels.  Every  class  is  divided  into  smaller  subclasses. 
The  smallest  unit  of  classification  is  a  group  of  vessels  of  a 
definite  type,  for  example,  of  the  type  "Poltava,"  "Krasnograd," 
etc.,  having  many  identical  characteristics.  These  types  take  into 
consideration  design  features  and  adjustability  to  transport  of 
definite  cargoes.  Every  vessel  in  the  zero  class  should  possess 
a  certain  formal  criterion  (index),  which  takes  this  classification 
into  consideration.  Many  of  the  problems  of  operational  regulation 
of  the  work  of  the  fleet  are  solved  for  Interchangeable  hawsers 
and  vessels.  Therefore,  in  the  process  of  solving  a  problem  from, 
the  formal-  criterion  (index)  the  machine  should  determine  what 
vessels  can  be  used  in  every  concrete  case. 

The  second  Important  group  of  objects  and  a  rather  large  one 
is  ports.  In  this  group  there  are  all  ports,  in  which  the  vessels 
of  a  given  basin  are  processed;  domestic  ports  of  the  given  basin 
and  other  basins,  and  also  foreign  ports.  Every  group  of  ports  is 
distinguished  by  composition  of  necessary  information. 

If  classification  of  vessels  and  ports  is  rather  simple,  then 
classification  of  other  objects  and  sources  of  information  of  the 
system  and  formalization  of  their  characteristics  have  difficulties. 
This  pertains  especially  to  loads.  For  this  purpose  it  is  necessary 
to  develop  anew  a  single  product-list  of  cargoes,  which  is  needed 
because  at  present  different  product-lists  exist;  one  for  determina^ 
tion  of  tariffs  on  transport  of  cargoes  in  export-import  and 
cabotage,  others  for  standardization  of  loading  and  unloading  of 
works,  and  still  others  for  capture  of  cargo  collection.  All  these 
product-lists  have  noncoincident  designations  of  cargoes,  and  to 
store  in  the  machine  all,  product-lists  is  inexpedient.  Therefore, 


111 


a  new  product-list  of  cargoes  should  consider  in  the  form  of  a 
characteristic  all  necessary  Indices, 

Every  Information  table  consists  of  three  parts : 

cap  of  objects, 

cap  of  characteristics, 

table  of  values ,  / 

The  principle  of  composition  of  an  information  table  is  well- 
knovm  [5],  In  the  beginning  designations  of  all  objects  of  class  M 
are  recorded  and  numbered.  Then  there  are  recorded  designations 
of  all  characteristics  (criteria)  describing  the  state  of  the  objects 
of  the  given  class  in  a  certain  sequence  (a  procession  of  criteria 
is  composed),  and  they  are  numbered. 

In  the  first  line  of  the  information  table  criteria  are  recorded 
in  the  order  of  their  numbers.  In  the  second  line  there  are  recorded 
under  every  designation  the  values  of  the  given  characteristic 
belonging  to  object  No,  1,  In  the  third  line  there  are  placed 
characteristics  for  object  No,  2,  etc. 

The  cap  of  objects  is  a  vertical  cap  since  every  assignment  of 
the  object  corresponds  to  a  line  of  information  table.  The  cap  of 
characteristics  is  horizontal  because  every  assignment  of  charac¬ 
teristic  corresponds  to  its  concrete  value  for  every  object. 

In  caps  of  objects,  besides  ideas  expressing  the  name  of  concrete 
objects  or  their  coded  equivalent,  there  are  given  links  of  entry  of 
classes  of  some  ideas  into  classes  of  other  ideas. 

It  is  necessary  to  note  that  characteristics  can  also  be 
classified,  i.e.,  characteristics  can  also  be  divided  into  classes, 
subclasses,  etc.,  by  ranks.  For  example,  such  a  characteristic  of 
a  vessel  of  the  first  rank  as  "coordinate"  is  divided  into  two 


characteristics  of  zero  rank  —  "latitude"  and  "longitude,"  Therefore, 
in  caps  of  characteristics  also,  besides  the  name  of  characteristics, 
there  are  given  links  of  entry  of  characteristics  of  junior  ranks 
into  classes  of  senior  ranks  and  their  links  with  the  matrix.  This 
is  done  for  convenience  of  retrieval  and  delivery  of  a  reference. 
During  retrieval  of  information  there  should  be  assigned  the  rank 
of  the  idea  of  the  object  or  characteristics  by  which  the  necessary 
values  are  selected. 

In  one  information  table  one  should  place  data  about  objects 
the  majority  of  the  characteristics  of  which  have  identical  names, 
and  there  must  alsb  be  considered  the  convenience  of  retrieval  of 
information  for  solution  of  problems.  Thus,  for  example,  it  is 
expedient  to  have  separate  information  tables  for  oil,  passenger, 
and  dry  cargo  fleets. 

The  information  table  for  cargo  vessels  of  the  dry  cargo  fleet 
includes  technical  characteristics  of  vessels  (carrying  capacity, 
load-carrying  capacity,  speed  of  movement,  type  of  engine,  number 
of  holds,  hatches,  etc.),  normative  data  (norms  of  consumption 
of  water,  fuel  on  water  and  on  stand,  primecost  of  vessel,  norm  of 
consumption  of  currency,  etc,),  plan  of  work  of  vessel  in  current 
trip  (scheduled  assignment),  location  of  vessel  and  form  of  fulfilled 
operation  by  last  operational  data,  fulfillment  of  scheduled  assign¬ 
ment,  data  about  fiilfillment  of  preceding  trips  (time  of  trip, 
designation  and  quantity  of  transported  load,  material  and  financial 
consumptions,  sum  of  prize,  etc,).  The  vertical  cap  of  this  table 
is  of  the  third  rank  and  contains  around  five  hundred  designations 
of  characteristics.  The  volume  of  the  horizontal  cap  depends  on 
the  number  of  vessels  of  steam  navigation  and  can  also  contain  more 
than  one  himdred  designations. 

There  may  also  be  cases  in  which  retrieval  detects  not  one  of 
the  ideas  of  the  object  or  characteristic.  In  this  case  there  are 
generated  special  criteria  equivalent  to  the  expressions  "there  is 
no  assigned  idea  in  the  cap,"  "there  are  ho  ideas  of  zero  rank 
subordinated  to  the  assigned  idea  in  the  cap."  If  for  description 
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of  a  certain  object  a  certain  characteristic  Is  useless,  the  criterion 
"does  not  have  meaning"  Is  Introduced,  which  Is  placed  on  the  place 
of  concrete  value  of  the  characteristic.  Furthermore,  there  Is 
Introduced  the  official  criterion  "there  Is  no  Information,"  which 
Is  used  If  information  about  the  value  of  the  given  characteristic 
for  any  reason  did  not  enter  the  information  system. 

Thus,  the  Information  system  Is  in  the  form  of  information 
tables.  The  finding  of  the  concrete  value  of  a  characteristic  is 
determined  by  the  Information  retrieval  algoritlim  for  which  there 
must  be  assigned  certain  Initial  data:  number  of  information  table, 
number  of  line  of  Information  table  (name  of  object)  and  kind  of 
concrete  characteristic  of  object  corresponding  to  name  (number  of 
column) . 

Algorithms  of  solution  of  concrete  problems  developed  at  present 
do  not  provide  for  machine  preparation  of  initial  data  and,  therefore, 
require  large  expenditures.  During  automation  of  control  of  the 
transport  process  it  is  necessary  to  liberate  maintenance  personnel 
from  labor-cons\imlng  work  on  data  processing.  Subsequent  organiza¬ 
tion  of  solution  of  the  problem  in  a  system  of  information  provision 
is  possible. 

Algorithms  of  solution  of  problems  of  control  of  the  transport 
process  will  be  developed  beforehand,  and  programs  of  their  solution 
will  be  presented  in  the  form  of  a  library  of  standard  subroutines. 

The  solution  of  every  problem  is  divided  into  two  stages: 

—  preparation  of  initial  data; 

—  solution  of  the  problem  according  to  obtained  data. 

Therefore,  programs  of  preparation  of  initial  data  have  to  be 
separated  into  a  definite  group  of  problems.  In  each  program  of 
preparation  of  initial  data  there  participate  information-retrieval 
algorithms  which  depend  on  the  organization  of  the  information 
depository. 
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For  simplification  of  distribution  of  information  about  the 
transport  process  a  certain  assumption  is  made  in  our  case;  let  us 
keep  in  mind  that  the  number  of  system  elements  is  constant.  Further¬ 
more,  let  us  take  into  consideration  that  the  number  of  characteristics 
necessary  for  description  of  the  properties  of  every  object  is  also 
known. 

Knowing  the  limits  of  change  of  the  contents  of  the  characteris¬ 
tics,  it  is  possible  to  establish  the  volume  of  information  which  will 
describe  the  state  of  the  system  in  a  certain  interval  of  time. 

An  important  question  of  investigation  is  development  of  a 
method  (language)  “of  presentation  of  Information  inside  the  machine, 
ideas  of  caps  of  objects,  and  characteristics  and  their  values. 

Closely  connected  with  this  is  construction  of  class if icational 
links  between  objects  and  characteristics  and  also  application  of 
methods  of  distribution  of  information  in  memory  units. 

The  main  part  of  the  Information  will  be  stored  on  magnetic 
tape  in  linear  foi-m.  In  this  sense  the  information  table  should  be 
placed  line  by  line  or  colvimn  by  column.  In  our  case  it  is  expedient 
to  place  information  line  by  line,  forming  separate  arrays  for  every 
object . 

To  ensure  compact  recording  of  information  in  memory  there  are 
used  several  special  procedures.  The  simplest  is  the  position 
principle  of  distribution.  In  this  case  for  every  object  there  is 
assigned  a  strictly  defined  volume  of  memory  for  storage  of  informa¬ 
tion  about  it.  In  the  process  of  work  there  will  be  changed  the 
contents  of  cells  storing  variable  information,  and  the  character 
of  information  and  cells  assigned  Tor  their  storage  remain  constant. 
Under  the  value  of  the  characteristic  there  is  assigned  a  constant 
number  of  binary  digits  corresponding  to  its  maximum  value. 

The  main  drawback  of  such  a  method  is  vineconomic  use  of  memory 
when  there  are  many  empty  values  of  characteristics.  Other  methods 
of  information  consider  nonzero  values  of  characteristics,  which  are 
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recorded  sequentially  one  after  the  other.  Upon  the  finding  of 
Information  there  are  used  various  kinds  of  logical  scales,  which 
are  at  the  beginning  of  the  array  of  characteristics  for  a  certain 
object.  These  scales  consider  the  presence  of  characteristics  and 
their  dimension. 

Selection  of  a  method  of  distribution  of  information  tables  can 
be  solved  after  their  construction  and  solution  of  the  problem  of 
presentation  of  information. 

Questions  of  development  of  methods  of  data  processing  are 
not  less  complicated.  For  purposes  of  transmission  and  processing, 
economic  information,  as  a  rule,  is  grouped  in  reports.  At  present 
there  have  been  developed  basic  principles  of  processing  of  data  on 
control  of  naval  transport.  As  a  basis  of  developed  algorithms 
there  is  assumed  the  principle  of  clear  separation  of  algorithms 
of  accumulation  of  information  from  algorithms  of  delivery. 

Algorithms  of  accmulatlon  of  information  are  algorithms  of 
input  and  restoration  of^ information,  and  algorithms  of  delivery  are 
algorithms  of  obtaining  answers  to  inquiries  and  preparation  of 
initial  data  for  solution  of  individual  problems,  A  method  of 
construction  of  algorithms  and  programs  of  processing  of  operational 
information  about  the  work  of  the  naval  fleet  has  been  developed 
by  Tseytln,  the  senior  scientific  colleague  of  the  A.  A.  Zhdanov 
Leningrad  State  University.  As  a  basis  of  algorithms  of  data 
processing  there  is  assumed  treatment  of  separate  elementary  reports 
containing  one  or  mors  characteristics  of  objects. 

The  first  variant  of  IPS  for  automated  control  of  the  work  of 
a  fleet  will  be  created  on  the  basis  of  the  Experimental  Computer 
Center  of  the  Naval  Fleet  in  Baltic  Steam  Navigation, 
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EXPERIMENTAL  INVESTIGATIONS  OF  COMPARATIVE  EFFECTIVENESS 
OF  MANUAL  AND  MECHANIZED  IPS  IN  THE  N.  K.  KRUPSKAYA 
LENINGRAD  STATE  INSTITUTE  OF  CULTURE 


A.  V,  Sokolov,  D.  I.  Byumenau,  R.  F.  Grinina, 
and  A.  M.  Sorkin 

The  present  work  deals  generally  with  the  problems,  method, 
and  results  of  three  Interconnected  experiments  conducted  in 
1964-1966  in  N,  K.  Krupskaya  Leningrad  State  Institute  (LGIK)  imder 
the  general  names  of  "Lastochka,"  "Estafeta,"  and  "Ruduga." 
Experiments  were  prepared  and  conducted  by  a  10-man  initiative 
group.  The  practical  target  of  investigations  consisted  in  obtaining 
Initial  data  for  determination  of  rational  fields  of  application 
of  manual  and  mechanized  information  retrieval, 

A.  I^roblems  of  Experiment 

Effectiveness  is  a  relative  concept.  The  conclusion  that  an 
IPS  is  more  effective  can  be  made  only  on  the  basis  of  its  comparison 
with  other  IPS.  In  LGIK  experiments  manual  card  files  were  compared 
to  mechanized  IPS  of  the  descriptor  type.  Effectiveness  of  IPS 
was  estimated  according  to  the  following  indices,  determining  both 
the  final  useful  effect  and  the  expenditures,  made  for  its 
achievement ; 

1.  The  quality  of  work  determined  by  information  bosses  and 
information  noise. 
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2,  Lab or -input  of  creation. 

5.  High  speed  operation. 

The  main  criterion  of  effectiveness  was  considered  to  be 
information  losses  and  the  main  attention  was  paid  to  exposure  of 
regularities  affecting  this  Index,  Experiments  were  conducted  on 
the  basis  of  three  different  collections  of  documents  on  library 
matter  and  scientific  information  with  the  use  of  both  inquiries 
formulated  by  the  experimenters  themselves,  and  inquiries  assigned 
by  information  users. 

Manual  IPS  were  investigated  in  two  directions: 

a)  appraisal  of  subject  and  systematic  catalogs ^accepted  in 
library  and  information  practice,  called  "traditional  IPS"  from 
now  on; 

b)  exposure  of  promising  ways  of  departure  from  traditional 
practice  within  the  bounds  of  manual  retrieval. 

Descriptor  IPS  participating  in  experiments  were  created  in 
accordance  with  rules  generalized  in  [1]  and  can  be  considered  by 
typical  representatives  of  the  given  class  of  IPS.  Descriptor 
langmge  was  represented  in  the  form  of  a  thesaurus  on  library  matter 
and  scientific  information  developed  at  the  N,  K.  Krupskaya  LGIK. 

Appraisal  of  effectiveness  of  traditional  IPS  as  compared  to 
descriptor  systems  is  the  purpose  of  the  works  "Lastochka"  and 
"Estafeta."  Analysis  of  catalogs  and  card  files  existing  in  informa¬ 
tion  services  and  libraries  permitted  to  set  that  the  following 
peculiarities  are  inherent  to  them  as  IPS: 

1.  As  an  IPYa  there  comes  forward  an  a  priori  assigned 


•;in  this  work  we  do  not  distinguish  between  the  terms  "card 
file"  and  "catalog." 
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hierarchical  diagram  of  classification  (for  systematic  catalogs)  or 
empirical  list  of  subject  heading  with  cross  references  (for  subject 
catalogs) . 

2.  There  is  used  a  criterion  of  semantic  conformity  "on 
Inclusion,"  according  to  which  in  delivery  there  are  Included 
documents  carrying  Indices  equal  or  subordinated  with  respect  to  the 
indices  of  the  retrieval  pattern  of  the  inquiry. 

5-.  A  document  yields  an  average  of  1-2  independent  indices 
and,  accordingly,  the  norm  of  duplication  is  not  more  than  1,5 
carde  per  document. 

4.  Realization  in  the  form  of  a  manvial-retrleval  card  file. 

Traditional  c,?.rd  files  for  LGIK  experiments  were  developed  by 
several  bibliographers  with  a  good  knowledge  of  accepted  library 
practice.  Their  problem  was  to  create  bibliographical  card  files 
satisfying  contemporary  requirements  for  such  card  files.  There 
were  no  limitations  on  the  finishing  of  available  classifications, 
depth  and  detailedness  of  indexing  and  degree  of  duplicating  of 
cards.  Bibliographers  worked  Independently  of  each  other.  It  is 
significant  that  card  files  obtained  as  a  result  corresponded  to 
the  above-formulated  "traditional  norm,"  which  once  again  confirmed 
the  Justice  of  these  norms. 

Table  1  gives  the  conditions  of  carrying  out  the  experiments 
"Lastochka"  and  "Estafeta"  and  the  obtained  results.  As  can  be 
seen  from  the  table,  the  difference  in  indices  of  Information 
losses  of  traditional  ca^rd  files  and  descriptor  IPS  clearly  exceeds 
the  bounds  of  possible  Inaccuracy  of  experiment,  therefore  a  con¬ 
clusion  can  be  drawn  concerning  the  superiority  of  typical  descriptor 
IPS  over  typical  traditional  card  files  in  the  sense  of  the  quality 
of  retrieval.  This  conclusion  is  important  in  itself,  but  is  still 
insufficient  for  detemlnation  of  rational  fields  of  application  of 
manual  and  mechanized  retrieval  technology.  There  remained  open 
the  question  of  the  possibility  of  improvement  of  manual  IPS  by  way 
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of  deviation  from  certain  "traditional  norms,"  To  determine  ways 
of  increasing  the  quality  of  work  of  traditional  IPS  we  will  analyze 
sources  of  information  losses  and  information  noise  inherent  to 
them  and  ways  of  removing  them; 


1.  Apriority  of  classification  diagrams  utilized  for  con¬ 
struction  of  systematic  catalogs.  Preassigned  classifications  are 
in  no  way  oriented  to  the  available  file  of  documents;  ideas  and 
links  essential  for  description  of  the  documents  of  a  given  file 
are  always  absent  in  them.  For  the  purpose  of  compensation  of 
apriority  of  classification  diagrams  there  is  practiced  their 
finishing  in  the  process  of  exploitation.  As  a  result  the  a  priori 
assigned  diagram  approaches  a  classification  diagram  developed 
empirically,  proceeding  from  subjects  of  given  array,  and 
theoretically  in  limit  should  merge  with  it.  During  the  carrying 
out  of  the  experiment  it  is  possible  to  exclude  the  influence  of 
apriority  by  way  of  use  of  empirically  composed  classification. 
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2.  Insufficient  depth  and  detailedness  of  Indexing,  due  to 
which  the  information  contents,  of  the  document  are  not  completely 
reflected  in  the  retrieval  pattern  of  the  Inquiry.  The  traditional 
method,  as  shown  above,  establishes  a  norm  of  duplicating  not 
more  than  1,5  cards  per  document,  thereby  limiting  the  possibilities 
of  indexing.  In  principle  this  limitation  can  be  removed  without 
changing  the  specific  character  of  the  system. 

5.  The  linearity  of  indices  of  manual  catalogs  manifested  in 
the  fact  that  heuristic  functions  are  inherent  to  only  the  left 
part  of  the  complicated  index;  nevertheless,  the  remaining  com¬ 
ponents  (in  particular,  model  subheadings  and  determinants)  do  not 
fulfill  heuristic  functions.  Removal  of  linearity  of  indices 
within  the  limits  of  manual  card  files  is  possible  by  way  of  dupli¬ 
cating  of  cards,  introducing  as  many  cards  per  document  as  there  are 
elements  in  the  retrieval  pattern  of  the  document. 

Possibility  of  decreasing  information  losses-  within  the  bovinds 
of  manual  IPS  are  covered  by  "Raduga."  The  task  of  "Raduga"  was  to 
construct  manual-retrieval  card  files  in  which  the  above-mentioned 
sources  of  information  losses  are  removed.  The  problem  of  "Raduga" 
corresponding  to  the  second  direction  of  investigations  of  manual 
IPS  was  formulated  in  the  following  way:  whether  there  exist 
objective  causes  preventing  the  obtaining  of  identical  completeness 
of  delivery  of  information  in  manual  card  files  and  in  model 
descriptor  IPS  during  equal  conditions  of  processing  and  retrieval 
of  documents?  In  order  to  check  the  authenticity  of  estimated  data 
obtained  in  "Lastochka"  and  "Estafeta"  the  "Raduga"  program  provided 
for  the  creation  of  traditional  subject  and  systematic  catalogs. 

A  traditional  systematic  catalog  was  organized  according  to  new 
Soviet  Library -Bibliographic  Classification  (BBK) . 

In  contrast  to  preceding  experiments,  where  IPS  were  constructed 
Independently  of  each  other,  in  the  case  of  "Raduga"  it  was  required 
to  provide  coordination  of  untradltional  manual  card  files  with 
descriptor  IPS  contrasted  to  it  in  order  to  exclude  the  influence 
of  subjectivity  of  operators  of  these  systems.  Such  a  measure  is 
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necessary^  inasmuch  as  any  IPS  belongs  to  the  class  of  "man-machine" 
systems  and,  therefore,  the  effectiveness  of  its  fulfillment  of 
functions  depends  both  on  the  objective  possibilities  of  the  IPS 
as  a  machine  and  the  subjective  qualities  of  the  person  of  the 
operator  working  with  it.  If  one  does  not  provide  for  measures 
for  exception  or,  in  the  extreme  case,  equalizing  of  the  Influence 
of  subjective  factors,  then  the  study  of  objective  characteristics 
of  the  system  becomes  impossible.  In  the  "Raduga"  experiment  the 
following  measures  were  taken  to  compensate  for  subjectivity: 

a)  information-retrieval  languages  were  equalized  with  respect 
to  semantic  force; ^ 

b)  description  of  information  contents  of  documents  (indexing) 
was  carried  out  with  an  Identical  degree  of  depth  and  detailedness; 

c)  programs  of  information  retrieval  on  inquiries  were 
coordinated; 

d)  retrieval  results  were  evaluated  on  the  basis  of  single 
criteria. 

In  order  to  satisfy  the  ■  l  st  condition,  for  an  untraditlonal 
systematic  catalog  there  was  developed  a  Special  Decimal  Classifi¬ 
cation  (SDK)  of  literature  on  ' 'brary  science,  bibliography,  and 
scientific  Information.  The  SDK  was  constructed  according  to  the 
type  of  traditional  "enumerating"  diagrams:  structurally  it  presents 
a  single  hierarchical  "tree"  of  ideas,  there  is  applied  decimal 
notation  and  a  scanned  system  of  model  divisions  (determinants)  and 
provision  is  made  for  the  possibility  of  formation  of  complicated 
indices  with  the  help  of  "colon"  arid  "plus"  signs,  A  peculiarity 
of  SDK  is  that  its  glossary  matches  the  glossary  of  an  empirically 


^Semantic  force  means  the  possibility  of  describing  phenomena 
by  means  of  a  given  language.  .  Semantic . force  determines  possible 
depth  and  detailedness  of  Indexing. 
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compiled  thesaurus.  Therefore,  one  may  assume  that  SDK  is  an 
empirically  created  classification.  Comparing  SDK  with  an  a  priori 
assigned  BBK,  it  is  impossible  not  to  note  the  considerably  great 
complexity  of  SDK  tables.  Into  SDK  there  went  IIO9  different  indices, 
whereas  in  the  BBK  card  file  only  482  vjere  used.  The  problem  of 
equalizing  the  semantic  force  of  the  descriptor  language  and  the 
language, of  subject  headings  was  solved  in  the  process  of  formulation 
of  subject  headings  by  way  of  coordination  of  them  with  the  retrieval 
pattern  of  the  document  in  descriptor  language.  The  subject  headings 
of  the  untraditlonal  catalog  Included  frequently  three  or  more  sub¬ 
headings,  whereas,  in  the  traditional  subject  catalog  the  structure 
of  the  headings  was  much  simpler  (as  a  rule,  title  and  subtitle). 

For  compensation  of  subjectivity  of  Indexers  in  untraditlonal 
IPS  and  in  descriptor  ^PS  there  was  carried  out  standardization  of 
the  retrieval  pattern  of  the  document.  Standardization  consisted 
in  the  fact  that  the  retrieval  patterns  of  docioments  composed  in 
various  IP.Ya  'included  the  same  idea.  Thus,  there  was  ensured  identical 
depth  .and  detailedness  of  Indexing.  Standardization  does  not  ensure 
absolutely  correct  indexing  (this  is  practically  impossible);  the 
purpose  of  standardization  is  to  provide  an  Identical  level  of  errors 
and  inaccuracies  in  all  untraditlonal  IPS. 

On  every  Independent  SDK  index,  besides  determinants,  there 
was  given  a  separate  card  in  the  card  file.  In  exactly  the  same 
way  on  every  significant  word  of  a  subject  heading,  with  the 
exception  of  model  subheadings,  corresponding  to  SDK  determinants, 
an  additional  card  is  started.  As  a  result,  as  Table  1  shows,  the 
degree  of  duplication  in  untraditlonal  catalogs  considerably  exceeded 
the.  usual  library  norms.  In  descriptor  IPS  there  were  used  during 
Indexing  an  average  of  5.65  descriptors  per  document. 

To  get  rid  of  subjectivity  in  understanding  inquiries  and  com¬ 
posing  the  retrieval  program  the  retrieval  patterns  of  inquiries  of 
\intraditional  manual  card  files  and  the  descriptor  system  were 
Intercoordlnated  in  such  a  way  as  to  achieve  standardization  of  the 
retrieval  pattern  of  the  inquiry  in  exactly  the  same  way  as  standardi¬ 
zation  of  the  retrieval  pattern  of  the  document  was  provided. 
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The  relevance  of  docxaments  was  evaluated  by  a  competent 
commission  on  the  basis  of  single  principles  shown  below.  To 
eliminate  tendentiousness  in  evaluation  of  relevance,  experiments 
were  organized  in  such  a  way  that  members  of  the  commission  did  not 
know  what  document  was  issued  by  one  system  or  another. 

Thanks  to  the  above-described  measures  there  were  excluded  two 
sources  of  losses  in  traditional  IPS:  apriority  of  classification 
diagram  and  insufficient  depth  of  indexing.  Partially  there  was 
compensated  also  linearity  of  indexation  of  manual  systems.  It  is 
true  that  compensation  of  linearity  was  not  complete  since  heuristic 
functions  were  nojj  given  to  SDK  determinants  and  model  subject  sub¬ 
headings.  Calculation  showed  that  if  this  was  done,  then  the  volume 
of  untraditlonal  catalogs  would  be  increased  2-5  more  times,  and  the 
degree  of  duplication  would  reach  6-7  cards  per  document. 

B.  Method  of  Experimenting 

During  the  carrying  out  of  experimental-  IPS  investigations 
wide  propagation  was  obtained  by  the  method  accepted  during  realization 
of  the  Cranfield  project  [2].  By  this  method  every  Inquiry 
participating  in  the  experiment  is  formulated  on  the  basis  of  a 
document -source  arbitrarily  selected  from  the  file  in  such  a  way 
that  the  document -source  completely  answers  it.  The  nximber  of  the 
document -source  is  reported  together  with  the  inquiry  to  the  person 
doing  the  retrieving.  Retrieval  on  the  inquiry  continues  until  the 
IPS  gives  out  the  document -source  or  permissible  retrieval  variants 
are  exhausted.  In  the  first- case  retrieval  is  considered  successful, 
and  in  the  second  unsuccessful.  The  total  percentage  of  unsuccessful 
retrievals  determines  information  losses. 

The  advantages  of  the  Cranfield  method  are  simplicity  and 
convenience  of  experimentation.  In  the  opinion  of  the  authors  of 
the  project,  the  method  proposed  by  them  permits  excluding  the 
complicated  question  of  evaluation  of  relevance  of  docvunents  to  the 
given  inquiry,  since  the  number  of  the  document -source  is  known 
beforehand.  At  the  same  time  there  are  doubts  with  respect  to  the 
reliability  of  this  method  [5,  4,  5]: 
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1.  It  Is  possible  to  assume  that  the  probability  of  non- 
issuance  of  a  document -source  depends  on  information  losses  Inherent 
to  IPS,  but  this  assumption  requires  proof. 

2.  Inasmuch  as  every  inquiry  corresponds  to  a  document -source 
the  connection  of  which  with  the  Inquiry  is  not  stipulated  by  the 
method.  Instead  of  the  problem  of  relevance  between  the  inquiry  and 
documents  as  a  result  of  retrieval  there  appears  the  problem  of 
relevance  between  the  docviment -source  and  the  Inquiry  made  on  the 
basis  of  it.  For  strict  carrying  out  of  the  experiment  it  is 
necessary  to  formulate  a  clear  criterion  of  conformity  between  the 
contents  of  the  document -source  and  the  Inquiry,  To  do  this  is  as 
difficult  as  to  stipulate  the  conditions  of  relevance  of  a  random 
docToment  to  a  given  Inquiry.  Thus,  despite  affirmation  of  the 
authors,  the  Cranfleld  method  does  not  exclude  the  problem  of 
relevance . 

3.  The  impossibility  of  use  of  "real"  inquiries  of  users  is  in 
no  way  connected  with  experimental  collection  of  docviments. 

4.  The  Cranfleld  method  does  not  permit  calculating  IPS 
information  noise. 

In  the  N.  K.  Krupskaya  LGIK  there  was  developed  a  more  exact, 
in  our  opinion,  and  more  complicated  method  of  experimentation  intended 
for  comparative  investigations  of  two  or  more  IPS.  According  to  the 
LGIK  method.  Information  losses  and  information  noise  for  n  Inquiries 
are  calculated  by  direct  means  by  formulas  (1)  and  (2)  : 

j  ^  ^1/  —Si 

where  L  is  information  loss;  N  is  information  noise;  S^  is  the  total 

number  of  relevant  documents  in  an  array  for  the  i-th  inquiry;  S^^  is 
the  Issued  number  of  relevant  documents  for  the  i-th  inquiry;  and 


(1) 

(2) 
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is  the  total  number  of  dociiments  issued  in  answer  to  the  i-tii 
inquiry. 

The  problem  of  relevance  is  solved  on  the  basis  of  the  following 
considerations.  Among  documents  issued  by  an  IPS  in  answer  to  an 
inquiry  there  are  always  doc\aments  which  can  be  confidently  recognized 
as  relevant  or,  conversely,  irrelevant  to  a  given  inquiry.  Authen¬ 
ticity  of  determination  sharply  increases  if  it  is  carried  out 
jointly,  by  a  special  commission.  Uncertainty  in  judgement  of 
relevances  spreads  only  to  a  certain,  as  a  rule,  small  number  of 
documents.  By  way  of  these  documents  there  is  created  Inaccuracy 
of  experiment,  but  the  possibility  of  realizing  the  experiment  is 
not  negated  by  the  presence  of  ''indefinite"  documents.  Practice 
showed  that  during  the  joint  method  of  determination  of  relevance 
the  shown  "uncertainty"  is  successfully  solved.  In  the  LGIK  there 
have  been  established  the  following  leading  rules  for  the  evaluation 
of  the  relevance  o^  documents  to  an  inquiry: 

1.  Relevance  must  be  evaluated  on  the  basis  of  the  source 
used  during  indexing.  If  indexing  is  carried  out  according  to 
annotation  or  abstract,  then  accessing  the  primary  source  or  con¬ 
jecturing  its  contents  is  not  allowed. 

2.  Relevance  is  determined  irrespectively  of  the  reader's 
assignment  of  the  document  (theoretical  article,  patent,  popular 
work),  if  the  reader's  assignment  is  not  stipulated  in  the  request. 

3.  If  the  doc\iment  concerns  a  less  general  idea  than  the  idea 
assigned  in  the  inquiry,  then  the  document  is  considered  relevant; 
if,  however,  a  more  general  idea  appears  in  the  document,  then  it 
is  not  recognized  as  relevant. 

The  described  method  can  determine  for  every  Inquiry  and 

every  IPS.  As  is  the  total  useful  delivery  of  all  IPS.  Such 
•^1 

an  assumption  is  based  on  the  fact  that  delivery  of  systems,  as 
practice  shov;s,  is  additional  one  to  another,  i.e.,  relevant 
documents  not  Issued  by  one  system  are  issued  by  another,  and 


vice  versa.  The  more  systems  participate  in  the  experiment,  the 

nearer  sum  approaches  the  hypothetical  value  of  S-  appearing 
1  2.^ 

in  formula  (1).  It  must  he  borne  in  mind  that  an  absolutely  accurate 

determination  of  information  losses  is  not  necessary  to  come  to  a 

conclusion  about  the  superiority  of  one  IPS  over  another;  it  is 

enough  to  be  convinced  that  one  IPS  issues  a  greater  niomber  of 

relevant  documents  than  another,  i.e.,  to  obtain  the  difference 

value  of  information  losses  of  Investigated  IPS. 

The  LGIK  method  has,  in  our  opinion,  the  following  merits: 

a)  direct  calculation  of  information  losses  and  information 
noise  increases  authenticity  of  obtained  results; 

b)  the  possibility  of  using  "real"  inquiries  of  users; 

c)  the  possibility  of  calculating  Information  noise. 

The  deficiencies  of  the  method  are  the  necessity  of  the 

participation  in  the  expferlment  of  a  minimum  of  two  IPS;  and  the 

relative  accuracy  of  determination  of  S>r.  . 

^i 

C.  Results  of  Experiments  and  Conclusions 

Table  1  gives  actual  data  of  LGIK  experiments.  Values  of 
Indices  of  effectiveness  of  traditional  IPS  obtained  in  different 
experiments  closely  match,  in  spite  of  the  fact  that  in  every 
experiment  these  IPS  were  composed  by  different  specialists  on  the 
basis  of  different  literature  bases  and  different  sets  of  inquiries 
were  used.  Consequently,  these  indices  are  of  an  objective  nature 
and  do  not  depend  on  the  subjectivity  of  the  operator  or  the 
conditions  of  the  experiment.  In  particular,  it  is  possible  to 
deduce  the  following  mean  information  losses  and  labor  consximption 
of  input  of  one  document  in  traditional  and  descriptor  IPS 
(Table  2) . 
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Table  2. 


IPS 

Informa¬ 

tion 

losses. 

Informa¬ 

tion 

noise. 

Input  time 
of  documentj 
min 

Systematic  catalogs 
(traditional) 

53 

60 

2.26 

Subject  catalogs 
(traditional) 

55 

21 

2.0 

Descriptor  IPS 

12 

21 

9.21 

Table  2  data  show  that  the  quality  of  work  and  labor  consumption 
of  creating  traditional  systematic  and  subject  catalogs  are  on  one 
level.  Information  losses  in  typical  traditional  catalogs  are 
higher  than  in  typical  descriptor  IPS,  but  the  time  spent  processing 
documents  in  the  first  is  four  times  less.  Thus,  the  first  problem 
of  the  experiments  (evaluation  of  the  effectiveness  of  traditional 
retrieval  technology)  can  be  considered  carried  out. 

In  order  to  answer  the  question  of  whether  there  are  objective 
causes  preventing  tjne  obtaining  of  an  identical  level  of  information 
losses  in  manual  card  files  and  descriptor  IPS  (the  second  problem 
of  investigation) ,  let  us  analyze  the  causes  of  appearance  losses 
in  untradltlonal  catalogs  of  the  "Raduga"  experiment  (second  stage). 
The  experiment  showed  that  in  these  catalogs  there  are  differences 
in  losses  of  information  from  descriptor  IPS,  in  spite  of  steps 
taken  to  compensate  for  the  subjectivity  of  operators  of  these 
IPS.  The  following  sources  of  information  losses  were  exposed 
(Table  5) . 

Technical  errors  were  headpiece  of  cards  or  their  omission 
during  retrieval,  incorrect  writing  of  index  on  card,  absence  of 
index  entering  standard  retrieval  pattern,  and  incorrect  puncture 
or  error  during  readout  of  punched  cards. 
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Table  3. 


Sources  of  losses 

Information  losses,  i) 

SDK 

subject  cata¬ 
log  (untradi- 
tlonal) 

descrip¬ 
tor  IPS 

Technical  errors 

11.9 

1.2 

5.9 

Subjectivity  of  indexing 

2.8 

2.7 

2.8 

IPYa  defects 

4.3 

2.1 

0.3 

Linearity  of  indices 

20.2 

8.9 

— 

All 

39.2 

14.9 

7.0 

Subjective  errors  of  Indexing  were  defects  of  the  standard 
retrieval  pattern  (insufficient  depth  or  detailedness  of  indexing). 
IPYa  defects  consisted  in  absence  of  basic  links  between  IPYa  elements 
useful  for  retrieval.  Losses  due  to  linearity  of  indices  of  manual 
IPS  are  caused  by  the  impossibility  of  retrieval  according  to  model 
subheadings  and  determinants,  since  these  language  elements  did  not 
give  heuristic  functions. 

It  is  obvious  that  there  will  always  be  technical  errors  in 
systems  of  the  "man-machine"  type  including  a  person  as  one  of  the 
sections.  In  exactly  the  same  way  there  are  removed  errors 
connected  with  the  subjectivity  of  Indexers,  although  thanks  to 
standardization  of  retrieval  patterns  these  errors  can  be  brought 
to  one  level.  The  inevitability  of  the  presence  of  the  two  sources 
of  information  losses  shown  proves  the  irapossj.billty  of  practical 
construction  of  real  IPS  having  zero  information  losses.  ’ 

With  the  specific  character  of  IPS  there  are  connected  losses 
caused  by  IPYa  defects  and  linearity  of  indices,  Iii  descriptor  IPS 
the  influence  of  the  first  source  of  information  losses  is  insignifi¬ 
cant,  and  the  second  source  is  absent.  Inasmuch  as  basic  links 
were  established  by  IPS  operators  on  the  basis  of  erudition  and 
intuition,  then  omissions  and  errors  in  exposing  them  are  inevitable 


130 


/ 


in  any  IPS.  But  the  specific  character  of  the  language  of  manual 
systems  is  that. in  them  the  probability  of  appearance  of  defects  is 
greater  than  in  descriptor  IPS.  With  increase  of  depth  and 
detailedness  of  indexing  there  is  quickly  complicated  the  structure 
of  the  hierarchical  diagram  of  classification  and  the  dictionary  of 
subject  headings.  At  the  same  time  achievement  of  such  quality  of 
indexing  in  descriptor  IPS  does  not  require  much  complication  of  the 
thesaurus,  thanks  to  the  fact  that  every  word  of  descriptor  language 
possesses  heuristic  functions.  It  is  not  difficult  to  see  that 
"simplicity"  of  descriptor  language  is  the  result  of  the  basic 
distinctive  peculiarity  of  systems  of  coordinate  indexing  —  the 
possibility  of  retrieving  according  to  any  IPYa  elements  and  their 
combinations.  Manual  means  of  realization  do  not  allow  such  a 
possibility.  Thus,  in  the  case  of  manual  IPS  there  is  obtained 
a  closed  circle:  to  decrease  losses  of  information  it  is  necessary 
to  increase  depth  and  detailedness  of  indexing,  which  cannot  be  done 
without  development  of  complicated  IPYa  possessing  great  semantic 
force.  In  turn,  IPYa  complication  inevitably  leads  to  errors  and 
omissions  during  its  creation  and  use,  which  involves  information 
loss . 


Linearity  of  Indexation  in  manual  catalogs,  as  shown  above, 
can  be  completely  excluded  by  way  of  increasing  degree  of  duplicating 
of  cards.  In  the  "Raduga"  expczlment  we  refused  formation  of 
independent  indices  on  the  base  of  determinants  of  classification 
and  model  subheading  since  this  would  contradict  the  main  purpose 
of  these  elements  —  to  serve  as  auxiliary  means  for  more  accurately 
defining  of  basic  indices  IPYa,  But  in  principle  such  measure  is 
not  excluded. 

Returning  to  the  main  question  posed  in  the  "Raduga"  experiment 
we  can  ascertain  that  there  were  not  revealed  theoretically 
Irremovable  obstacles  to  bringing  the  level  of  information  losses 
in  manual  card  files  to  the  level  of  information  losses  In  descriptor 
IPS  if  Initial  data  are  equal.  However,  practical  achievement  of 
this  level  is  haiupered  by  increasing  complication  of  manual  systems. 

In  systematic  catalogs  a  practically  irremovable  source  of  information 
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losses  is  apriority  of  classification  since  introduction  of  empirical 
diagrams  apparently  is  impossible. 

Conclusions 

1.  It  is  practically  impossible  to  construct  bibliographic 
IPS  possessing  an  "ideal"  quality  of  work,  i.e.,  zero  Information 
losses. 


2.  Traditional  IPS  having  comparatively  simple  IPYa  structure 
and  using  small  depth  and  detailedness  of  indexing  independently 

of  the  principle  of  organization  of  the  card  file  —  subject  or 
systematic  —  have  approximately  higher  information  losses  than 
model  descriptor  IPS. 

3.  The  tendency  to  compensate  sources  of  information  losses 
of  traditional  IPS  within  the  bounds  of  manual  retrieval  leads  to 
Increase  in  the  volume  of  card  files  and  the  labor  consumption  of 
their  creation  and  to  considerable  complication  of  IPYa  structure, 
which  is  a  potential  source  of  information  losses. 

4.  Descriptor  IPS  in  principle  must  provide  a  minimum  level 
of  information  losses  but  realization  of  such  systems  takes  much 
more  work  than  realization  of  manual  IPS. 


Actual  data  obtained  by  us  can  be  used  in  further  investigations 

aimed  at  determining  the  rational  fields  of  application  of  various 

IPS.  In  this  we  see  the  basic  meaning  of  experiments  conducted 

at  the  N.  K.  Krupskaya  LGIK. 

/ 
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INFORMATION  RETRIEVAL  SYSTEMS 
R.  2.  Trukhayev  and  V.  V.  Khomenyuk 

A  description  of  an  information  retrieval  system,  its  functional 
stages  and  structure  is  given. 

1.  Information  retrieval  systems  are  one  of  the  means  and 
component  parts  of  an  organ  for  making  a  decision  (by  which  is  under¬ 
stood  a  group  of  people,  one  person,  a  technical  device  or  a  complex 
of  technical  devices,  etc.)  during  the  study  and  explanation  of  the 
character  of  functioning  of  different  objects  (processes,  phenomena 
and  others)  for  the  purpose  of  output  of  a  solution  on  control  of 
this  object,  and  also  for  the  purpose  of  accounting  for  the  possible 
Influences  from  the  side  of  this  object  (in  case  it  actively  or 
passively  counteracts). 

Functioning  of  an  object  is  examined  as  a  sequence  bf  transitions 
(in  time)  of  an  object  from  one  position  x  in  n-dlmensional  metric 
space  into  another  possible  position  y  €  H. 

Two  basic  types  of  information  retrieval  problems  are  possible 
(according  to  the  character  of  the  goal  searched)  depending  on  what 
is  required:  to  separate  an  object  with  certain  criteria  from  a 
great  number  of  analogous  objects  or  to  determine  the  state  of  the 
object  from  a  certain  large  number  G  of  possible  states. 


By  position  of  the  object  Is  understood.  In  the  first  case,  the 
vector  of  characteristic  criteria  of  the  object,  in  the  second  case, 
the  vector  of  spatial  coordinates. 

Information  retrieval  systems  are  intended  for  determination  of 
the  position  of  an  object.  What  does  application  of  Information 
retrieval  systems  give? 

Information  retrieval  systems  permit  determining  the  position  of 
an  object  in  time  with  a  certain  degree  of  definitiveness  depending 
on  the  time  of  functioning,  constfuction  and  other  parameters  of  the 
information  retrieval  system.  Obtained  information  about  the  position 
of  an  object  permits  the  organ  making  the  decision  to  carry  out  its 
functions;  to  make  one  or  another  decision  in  accordance  with  goals 
of  its  work.  For  example,  an  information  retrieval  system  should  find 
the  code  of  a  book  from  a  catalog  and  (if  this  is  necessary)  find 
the  book  and  issue  it  to  the  reader. 

Further  on  we  will  briefly  examine  the  principles  of  functioning, 
capabilities.,  structure  and  criteria  of  work  of  information  retrieval 
systems. 


2.  The  essence  of  functioning  of  an  information  retrieval  system 
consists  in  determination  (finding)  the  position  of  an  object  on  the 
basis  of  a  certain  model  of  functioning  of  the  object,  data  obtained 
during  search,  observation  and  processing  of  available  informations 
about  the  object. 

It  is  necessary  to  note  that  information  retrieval  systems 
should,  as  a  rule,  carry  out  a  purposeful  or  goal-directed  process  of 
retrlval  of  Information  about  the  position  of  an  object,  since  there 
are  different  criteria  and  limitations  on  parameters  of  functioning 
of  Information  retrieval  systems. 

Functioning  of  information  retrieval  systems  can  be  split  into 
the  following  stages  (see  Fig.  1); 
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Organ  naklng  dielilon 

1 

Stags  1 

ObXwining  “squisitlon  and  ” 
Its  prallnlnary  analysis 

T 

Stags  2  /  ,  . 

Uomposition  (saleotloni  and 
study  of  model  of  function¬ 
ing  of  information  retrieval 

Stage  3  1 
Formulation  of  information  5 
retrieval  problem  | 

iStage  4 

1  Theoretloal  solution  of  in- 
1  formation  retrieval  problem 

Stags  5 

Praotioal  perforinanoa  of 
requliltlon 

\ 

Stags  6 

Check  of  eorreotness  and 
fitnssB  of  results  searched 

] 

Stage  7  I 

Delivery  of  results 

Pig.  1.  Stages  of  functioning 
of  an  information  retrieval 
system. 


1)  obtaining  a  requisition  or  order  (from  the  organ  making  the 
decision)  and  its  preliminary  analysis; 

2)  composition  (or  selection)  and  study  of  qualitative  and 
quantitative  models  of  functioning  of  the  object; 

3)  formulation  of  the  information  retrieval  problem; 

4)  theoretical  solution  of  the  information  retrieval  problem 
posed; 

5)  practical  performance  of  search; 

6)  check  of  correctness  and  fitness  of  data  obtained  by  the 
information  retrieval  system; 
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7)  readout  and  results  to  organ  maJclng  the  decision. 


The  first  stage  consists  in  obtaining  a  requisition,  fron  tlie 
orgat.  naking  the  decision,  formulated  according  to  set  staridard  I’ules 
(in  the  form  of  a  formula,  oral  assignment  and  so  forth).  In  the 
requisition  are  indicated: 

a)  criteria  of  the  object  separating  it  from  analogous  objects, 
or  certain  information  about  its  position  accordingly  for  problems  of 
the  first  and  second  types; 

b)  requirement  about  finding  the  actual  object  or  requirement 
about  finding  the  position  of  the  object; 

c)  criteria  and  iimiitatlcn  on  the  character  of  performance  of 
the  requisition  (time  of  performance,  required  accuracy  and  volume  of 
information,  etc.); 

d)  form  of  delivery  of  the  information  obtained  by  the  infor¬ 
mation  retrieval  system  to  the  organ  making  the  decision. 

Preli.mlnary  analysis  consists  of  the  fact  that  a  requisition  is 
formulated  Ir.  accordance  vflth  standard  rules,  and  in  preliminary 
appraisal  of  the  possibility  cf  use  of  the  obtained  requisition  in 
t'rii  inf (;rmat icn  retrieval  system. 


The  second  stage  consists  of  the  fact  that  on  the  basis  cf  stuay 
of  the  actual  object  a:id  other  indirect  information  about  the  object 
a  qualitative  or  quantitative  model  of  functioning  of  the  object  is 
composed  (selected). 

The  third  stage  consists  in  presenting  the  inf  nmatiop.  retrieval 
problem  on  the  basis  of: 

a)  given  requirements  from  the  requisition; 

b)  .model  of  functioning  of  the  object; 
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c}  capablllt 


the  information  retrieval  system; 


d)  criteria  of  fanctioriing  of  the  Information  retrieval  system. 

Mathematically,  information  retrieval  problems  are  formulated  as 
problems  of  finding  solutions  to  problems  of  mathematical  logic, 
problems  of  mathem.atical  statistics  and  the  theory  of  statistical 
solutions,  systems  of  equations  and  inequalities,  on  the  one  hand, 
or  solution  of  extreme  problems  In  the  presence  of  limitations,  etc., 
on  the  other  hand. 

The  fourth  stage  consists  in  development  and  application  of 
methods  for  solution  cf  a  posed  information  retrieval  problem,  and 
also  In  a  check  of  correctness  cf  obtained  (on  the  basis  of  these 
.methods)  a  theoretical  solutions  of  an  Information  retrieval  problem. 

The  fifth  stage  corisists  in  practical  performance,  by  an  informa¬ 
tion  retrieval  syste.m,  cf  a  received  requisition  or  order  on  the  basis 
cf  solution  of  the  information  retrieval  problem. 

The  sixth  and  seventh  stages  consist  in  a  check  of  correctness  and 
fitness  of  data  obtained  by  an  Infor.mation  retrieval  .system  and 
delivery  cf  the  obtained  results  to  the  organ  making  the  decision. 

3.  In  accordatuc  wiTfn  stages  of  functlonirig ,  an  Information 
retrieval  syste.m  .ha,s  t.he  following  structure  (see  Fig.  2). 


Fig.  2.  Structure  of  infoi'matlcn 
retrieval  system. 


An  inf c-rma tier,  retrieval  systen  consists  of  retrieval  tools  and 
the  organ  of  control  of  the  systeit. 

The  tools  can  constitute  dlfferert  technical  devices,  construc¬ 
tions  and  attachment,  and  also  complexes  of  technical  devices  together 
with  personnel.  Not  stopping  for  detail  on  principle.-:  cf  nc-..i.oe3  and 
work  of  retrieval  tools,  let  us  give  an  idea  about  the  retrieval 
unit,  tb.e  space  searched,  density  of  distribution  of  retrieval 
equipment  ar;d  its  volum^e. 

Information  retrieval  systems  consist  of  a  set  of  .  varying 
quantity  of  various  types  cf  retrieval  equipment. 

A  retrieval  unit  of  a  certain  type  is  that  quantity  of  equipment 
of  this  type  v.’hlch  has  its  own  control  system  and  is  intended  for 
production  of  a  defined  character  of  information  about  an  object.  A 
retrieval  unit  of  a  system  can  be  a  certain  technical  device,  a  person 
or  group  cf  people,  a  complex  of  technical  devices  with  perscnr.el, 
etc.,  which  are  intended  for  production  cf  information  about  the  object. 
The  concept  of  a  retrieval  unit  is  conditional,  just  as  any  other 
concept  of  unit  of  weight,  length,  area,  etc. 

The  space  searched  by  the  tools  cf  an  information  retrieval 
system  is  either  trie  space  and  time  coordinates  of  the  tool,  or  the 
character  and  quantity  of  Information  about  the  object  (obtained  from 
the  retrieval  tool) ,  or  a  combination  cf  them. 

Density  of  dis tribut j  o.i  oi  the  equipment  of  an  information 
retrieval  syste.m  is  defined  as  the  quantity  of  retrieval  units  of 
different  types  belonging  to  a  great  number  l.n  space  searched  unit 
measure . 

The  volume  cf  retrieval  equipment  of  an  information  retrieval 
systemi  is  defl.ned  as  the  total  quantity  of  retrieval  units  of  different 
types  distributed  in  the  whole  space  searched.  It  is  clear  that  the 
volume  is  determined  by  -anlflcation  with  the  sum  of  volumes  of 
uniform  equipment.  Here  under  volume  of  uniform  equipment  is  under¬ 
stood  tilt  quantity  of  units  of  cr:e  type  distributed  in  the  space 
searched . 
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Control  of  an  information  retrieval  system  constitutes  a 
technical  device  or  group  cf  people,  a  complex  of  technical  devices 
wit'i  personnel  intended  for: 


a)  developmeni;  of  dl stribution  of  retrieval  tools; 

b)  oontrcl, 

c)  delivery  to  organ  :r.aklng  a  decision. 

Tile  structure  of  control  of  an  Inforniatlon  retrieval  system  is 
represented  in  Fig.  3. 


cf  con-rol 

of  I?S 

^ — —  -» — "  — J 

S.'ster-  of  0G« 
v-.-.op'.er-.t  cf 
wi s* r  ib  'lion 

1  ri  !  ?  '■ 

SystGP.  of 
ccr.’.rwi 
over  IPS 

SyeteTi  for 
deliver:  t.-> 

or^&Ti  fT»a>Lvn^ 
ceclsion 

Fig.  3.  Structure  of  control  cf 
information  retrieval  system. 


According  to  stages  of  functioning  of  an  Information  retrieval 
system,  the  system  for  development  cf  distribution  means  of  retrieval 
is  fulfilled  by  stages  l-lJ  and  6,  control  of  the  system  by  the  5th 
stage,  and  the  system  of  delivery  to  the  organ  maK.ing  the  decisio.n  by 
the  7th  stage. 


The  system  of  develop.mer.t  cf  distribution  of  retrieval  of  an 
information  retrieval  system  consists  of  6  blocks  : 

a)  block  for  I’eception  of  requisition  and  its  preliminary 
analysis ; 

b)  block  cf  composition  (selection)  and  study  of  model  of 
funct  j  otilng  of  object; 

information  retrieval  problem; 


c)  block  of  fc'vi.'iuiatlon  of 


d)  block  of  theoretical  solution  of  infonnation  retrieval 
problem; 

e)  block  for  coupling  with  the  system  of  control  of  means  of 
retrieval  of  an  information  retrieval  system; 

f)  block  of  control. 


The  structure  of  the  system  for  development  of  distribution  of 
means  of  retrieval  of  a  system  is  represented  in  Fig.  4. 
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Fig.  4.  Structure  of  system  of  development  of 
distribution  of  means  of  retrieval  of  an  Infor 
matlon  retrieval  system. 


4.  The  information  retrieval  problem  is  that  of  determination 
of  distribution  of  means  of  retrieval  of  the  system  in  the  space 
searched  taking  limitations  and  criteria  into  account. 


Limitations  and  criteria  of  the  problem  arise  from  requirements 
of  the  requisition  from  the  organ  making  the  decision,  the  accepted 
model  of  functioning  of  the  object,  capabilities  and  requirements  of 
work  of  the  system,  i.e.,  parameters  of  the  control  system  and  means 
of  retrieval  of  the  system. 


The  character  of  limitations  and  criteria  of  the  problem  can  be 
represented  in  a  definite  form,  in  a  probabilistic  form  (assigned 
probabilistic  characteristics)  and  in  an  indeterminate  form  in 
accordance  with  the  form  of  the  requisition,  the  composition  (selection) 
of  the  model  of  functioning  of  the  object,  representation  of  work  of 
control  and  parameters  of  means  of  retrieval  of  the  system.  For 
example,  the  requisition  can  show  all,  or  perhaps  not  all  data  for  its 
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fulfillment.  In  addition,  the  model  of  the  object  can  be  composed 
In  a  definite  probabilistic  or  indefinite  couplings  (equations, 
inequalities,  functional  relationships,,  etc. ) .  Purtheiroore, 
knowledge  of  parameters  of  the  information  retrieval  system  can  also 
be  presented  in  a  determined  accidental  or  indeterminate  form. 

It  is  necessary  to  note  that  in  the  first  stage  of  posing  and 
solution  of  an  information  retrieval  problem  a  great  degree  of 
indeterminateness  (i.e.,  chance  and  uncertainty)  is  introduced  by 
parameters  of  the  model  of  functioning  of  the  object. 

In  practice,  limitations  and  criteria  are  expressed  in  different 
form;  for  example,  criteria  and  limitations  of  work  of  an  information 
retrieval  system  c^n  be: 

a)  readiness  of  the  system  to  fulfill  the  requisition; 

b)  probability  of  fulfillment  of  the  requisition  per  unit  time; 

c)  probability  of  obtaining  an  assigned  volume  and  accuracy  of 
information  about  the  object  per  unit  time; 

d)  cost  of  fulfillment  of  the  requisition  per  unit  time; 

e)  time  of  fulfillment  of  the  requisition; 

f)  volume  of  means  of  retrieval; 

g)  degree  of  automation  of  the  system; 

h)  simplicity,  adaptability  (fitness  for  performance  of  a  large 
number  of  mixed  requests)  of  the  information  retrieval  system; 

i)  fitness  for  execution  of  requests  which  can  appear  in  the 
future ; 

standardization  of  technical  means  of  information  retrieval 
etc . 


J) 

systems , 
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Let  us  note  that  In  different  information  retrieval  problems  one 
of  the  above-indicated  characteristics  can.  be  taken  as  a  criterion  of 
optimization  of  work  of  the  system,  and  others  as  limitations. 

At  any  stage  of  functioning  of  an  Information  retrieval  system  it 
is  desirable  to  use  a  contemporary  EVTsM  which  permits  accelerating  the 
time  for  performance  of  a  requisition,  increasing  the  accuracy  and 
volume  of  obtained  Information,  etc.  During  a  more  complete  (exact) 
account  of  the  character  of  functioning  of  the  object. 

Stages  examined  here  of  functioning  of  information  retrieval 
systems  and  accordingly  the  structures  of  information  retrieval 
systems  are  fairly  general  for  a  broad  class  of  operational  and 
designed  information  retrieval  systems. 


A  SYSTEM  0?  AUTOMATIC  DIFFERENTIATION  OF  DISTRIBUTION 
OF  INFORMATION  (SARI-I)  ON  CONSTANT  INQUIRIES 
DEVELOPED  IN  TsBNTI  AKIAE' 


A.  1.  Nadtochiy,  V.  P.  Kalinin,  N.  N.  Mikheyev, 
y.  A.  Voronin,  V.  I.  Gostev,  N.  S.  Denisenko, 
and  G .  S.  Chub  a 


Introduction 


Contemporary  rates  of  scientific  and  technical  progress  are 
connected  with  continuous  growtn  of  flow  of  information  materials 
and  sharply  Increasing  needs  in  information. 

Needs  in  corresp-o.nding  information  for  scientist  and  engineer 
appear  on  all  stages  of  scluticn  of  a  scientific-research  or  engineer¬ 
ing  problem. 


It  becomes  evident  that  at  tire  conte.mporary  stage  of  scientific 
and  tscnnlcal  progress  providing  cf  information  is  one  of  the  decisive 
faotcrs  (and  in  many  cases  a  bottleneck)  to  successful  movement  of 
scientific-research  and  research-design  works. 


In  these  conditions  TsBiJTI  v/as  faced  witir  the  problem 
creating  system  of  informatlcn  service  v/iilch  would  satisfy 
for  coricrete  cor;3umei''3 ,  both  with  respect  to  time  and  with 


of 

the  need 
r6sp0ct  to 


subj  ects . 


‘Expansions  unknown. 


witn  creation  of  a  rofcronce  and 


Solution  cf  t.hia  rrobien,  beg'ni 
infor.T.ation  fund  tlie  main,  coinponent  part  of  which  would  be  unpublished 
materials  —  i-eports  on  scientific-research  and  research-design  woi’ks 
.oriducteu  in  foreign  and  domestic  scientific  I’esearch.  and  design 
organisations.  Th.e  pi’lnciple  cf  layi.ng  It',  a  fuitd  of  tltese  materials 
was  accepted  in  connectlor  with  t'ne  fact  that  tliese  materials  present 
the  greatest  scleritlfic  interest ,  and  thej'  are  not  reflected  in 
periodical  publications. 


With  creation  and  accumulatioti  of  a  reference  and  information 
fund  there  was  taken  on  constant  information  service  of  a  number  of 
subscribers  (scle..tific-re3earch  and  design  organizations,  and  also 
big  scientific  organizations)  by  which  with  the  processing  of  a 
reference  and  information  fund  there  had  to  be  issued  information  on 
those  thematic  problems  on  which  they  are  working  at  a  given  time. 

To  solve  the  problem  there  was  developed  a  system  of  automatic 
dif ferentiatlo.n  distribution  of  information  (S.ARI-1)  with  respect  to 
constant  inquiries  with  the  help  cf  the  "Mlnsk-22"  on  the  basis  cf 
descriptor  language.  The  report  gives  the  essence  of  this  system. 

I .  Descriptor  Dictionary  (Thesaurus) 

The  list  of  descriptor.?  accepted  at  the  TsEUTI  for  the  AIPS  nas 
the  form  of  an  alphabetic  dictionary  with  indication  of  digital  Cvdes. 

Digital  codes  were  appropriated  to  simple  numeration  cf  elcmint;? 
of  different  lists  entering  the  dictionary: 


basic  dictionary 
supplements  to  it 
glossary 

supplements  no  it 
countrle.s 
reactors 
mass  numbers 


0001-1231 

1500-2000 

6000-8G00 
^OOC-5000 
5000-5200 
5200-6000 
9000-  . . 


]  lit) 


Thus,  all  tr;e  elements  of  the  dictionary  have  four-digit 
decimal  code,  but  are  recorded  In  paired  combinations  In  eight-digit 
code.  Digital  oodl:ig,  instead  of  Euratom  alphabetic  coding  is 
:  equired  I'or  purely  technicaJ.  reasons.  A  descriptor  dictionary  is 
built  on  the  basis  of  a  thesaurus  already  proved  with  respect  to 
frequency  and  a  glossary  accepted  by  Euratom  and  now  recommended  as 
an  Internatlor.al  tool  vJlttiin  the  bounds  of  MAGATE.  In  the  course  of 
practical  application  this  dictionary  in  1964-1965  there  appeared  the 
extreme  necessity  of  its  thematic  and  lexical  expansion  for  a  more 
complete  scope  of  l.hesiatlc  Interests  of  the  proposed  subscriber  network 
of  TsB.'M'I.  For  this  it  was  necessary  to  supplement  the  Euratom 
thesaurus  with  additional  descriptors  from  the  dictionary  of  the 
Gmelinsk  Institute  (FRG)  and  the  subject  indicator  of  the  American 
abstract  Journal  Nuclear  Science  Abstracts  (NSA).  The  volume  of  the 
basic  dictionai'y  increased  from  1231  to  approximately  l600  term-ideas. 


Simultaneously  from  the  Euratom  thesaurus  there  were  completely 
excluded  two  divisions  of  compound  descriptors  of  the  type 


a)  ferric  oxide 
uranium  oxide 
etc . 


iodine  isotopes 
carbon  isotopes 


sodium  sulfates 
potassium  sulfates 


with  respect  to  15  repeated  descriptors  for  all  chemical  eleme.nt.s; 


b)  uranium- .233 

strc.ntlum-85 

iodlne-131 

uraniam-235 

strontium- 89 

iodine-l40 

uranium-238 

strontiam-9G 

lcdlne-l4l 

etc . 

concrete  isotopes  of 

all  chemclal 

element  s . 

Che  instruction  to  the  descriptor  dictionary  there  is  gl 
free  combination  of  any  names  of  chemclal  ccmpcunds  with 
concretizing  chemical  elements. 


of  them  in 
ven  a  rule  cf 
their 


Oxide;  iron 
Isotopes;  iodine 
chlorides;  sodiu."i 


=  ferric  oxide 
=  isotopes  of  iodine 
=  sodium:  chloride,  etc. 
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This  madtj  indexing  mure  flexible  and  permitted  composing 
such  complicated  descripioi-s  as  "ui'anlum  slliclde,"  "molybdates  of 
amjiionium"  and  others,  wliich  was  not  in  tlie  Euratom  thesaurus. 

Foi’  coding  isotopes  v/lth.  tl.eli’  mass  numbei-s  there  was  sepai’afed 
the  9th  division  of  the  decimal  system.  Mass  numbers  of  any  chemical 
elements  are  expressed  by  code  wlthi  a  9  in  front: 


ui-anlum  -  2  35  has  the  code  116.8.92  35 
strontiuJTi  =  90  has  the  code  1070. QCyo,  etc. 


The  terms  of  the  Euratom  glossary  were  almost  compl.etey  Introduced 
it'.to  our  dictionary  with  all  references  provided  for  by  Euratom.  It 
also  gave  digital  codes. 

Furthermore,  the  dictionary  includes  coded  lists: 

a)  countries,  oceans  and  seas; 

b)  the  most  important  atomic  reactors; 

o)  the  best  known  theinncnuc lear  installations; 

d)  a  small  number  of  names  of  adjectives  allowing  concretizing 
such  xleas  as 

tempei-ature  ,  lev; 
temperature  ,  high 
uranl urn ,  natural 
interactions,  weak,  etc. 

As  a  result  of  all  these  reconstructions  there  was  obtained  a 
dictionary  allowing  indexing  deeply  enough  any  sources  of  information 
on  very  wide  subjects  and  coding  already  prepared  descriptor  patterns 
(Euratom  descrlptloiis )  and  subject  indicators  (NSA). 

Tlie  dictionary  including  gl.ossar-y  elements  contains  about  9000 
term-ideas  each  of  which  can  enter  a  paired  combination-  11'  it  has 
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meaning  and  turns  out  to  be  necessary  for- reflection  of  fund  material. 
The  term-ideas  "reactor"  with  code  0937  and  "energy"  with  code  0869 
give  in  our  case  the  compound  idea  "power  reactor"  with  code  0937 •0869. 
The  "analysis"  and  "chemistry"  permit  composing  terms 

chemical  analysis  with  code  0056.1533  and 
analytical  chemistry  with  code  1533.0056 

by  a  single  rule:  in  the  first  place  "that,"  and  in  the  second  "which." 

In  spite  of  supplements  to  the  Euratom  dictionary  and  reconstruc¬ 
tion  to  adjust  it  to  our  concrete  needs,  the  code  system  of  the  TsBNTI 
descriptor  dictionary  has  a  drawback:  hierarchical  or  associative 
connection  in  the  code  reflection  of  descriptors  is  absent.  Po^- 
example,  the  ideas 


nuclear  reactor  0937 
nuclear  fuel  0459 
active  zone  0934 
loop  of  heat-transfer  agent  0259 
retarders  0705 
heat-transfer  agents  0260 


are  not  Interconnected  with  respect  to  code,  although  they  form  a 
"field"  of  terms  directly  related  to  the  idea  "reactor"  hierarchically 
common  to  them.  The  absence  of  classification  connections  in  codes 
characteristic  for  all  official  codes  in  contrast  to  functional  codes 
hampers  Information  retrieval  and  makes  necessary  multiple  probing  of 
the  fund  with  respect  to  the  set  of  related  descriptors,  instead  of 
one-two  inquiry  descriptors  having  a  broader  meaning. 

II .  Processing  Information  Documents 

A  primary  information  unit  which  requires  separate  consideration 
and  individual  treatment  is  documents  or  their  parts'  (a  book  chapter 
separate  work  in  collection  and  so  forth),  treating  of  a  thematically 
Isolated  subject.  For  example,  in  the  report  of  the  research  center 
there  are  described  various  works  in  the  field  of  metallurgy  of  uranium 
and  other  fuel  materials.  Each  of  the  subjects  touched  is  considered 
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an  Independent  document  and  processed  Individually. 

An  Information  retrieval  cord  with  bibliographic  data,  annotation 
or  abstract  (if  necessary  and  practically  possible)  and  the  descriptor 
pattern  of  the  primary  document  (or  an  information  unit  of  it)  is  a 
secondary  document  (VD)  with  a  constant  retrieval  number  and  index  of 
universal  address  of  the  document  —  the  address  assigned  to  the 
dociiment  during  composition.  For  reports  this  Is  a  literal  abbrevia¬ 
tion,  the  name  of  laboratory  and  the  reference  number  of  the  report 
of  this  laboratory;  for  Journalistic  articles  it  is  a  four-letter 
code  of  the  name  of  the  Journal  on  CODEN  and  numbers  indicating 
number  of  volume,  page  and  year.  The  retrieval  number  appears  at  all 
stages  of  processing,  retrieval,  and  reproduction  of  primary  documents. 

-Contents  of  primary  documents  (or  their  units)  are  reflected  in 
secondary  documents  in  the  form  of  a  so-called  descriptor  pattern  -  a 
list  of  elementary  or  compound  descriptors  completely  enough 
expressing  the  thematic  essence  of  the  given  inforaatlon  unit.  In  a 
descriptor  pattern  there  should  not  be  nothing  superfluous  or  secondary 
from  the  point  of  view  of  successful  retrieval  of  a  given  document  on 
inquiry.  On  the  other  hand,  the  pattern  should  contain  all  descriptors 
corresponding  to  the  most  Important  aspects  of  the  present  work  having 
retrieval  value.  ^ 

In  the  accepted  TsBNTI  map  on  the  face  in  certain  places  there 
are  indicated  exploration  number  of  document,  its  authors''  and  name  in 
authentic  writing,  the  translation  of  the  name  into  Russian, 
annotation  or  abstract,  initial  bibliographical  information  and 
indication  of  the  address  of  the  abstract  in  NSA,.  On  the  other  side 
of  the  card  there  is  given  information  about  the  publishing  character 
of  the  document,  its  form  and  fulfillment,  place  of  publication  and 
place  of  storage,  and  a  list  of  necessary  descriptors  is  given.  The 
card  is  signed  by  its  compiler.  An  approximate  information  card  on  a 
document  in  our  fund  is  shown  in  Pig.  1. 

The  composed  map  is  edited  by  a  more  experienced  Indexer, 
especially  the  translation  of  the  name  of  the  work  and  its  descriptor 
pattern.  Practice  showed  that  even  with  the  most  conscientious 
relationship  to  Indexing  and  translation  the  editor  almost  always 
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■  ■  ■  -  '  -  Retrieval  Ho. 

(Cantral  fund). 

(Loeal  fund). 

Fona  of  matarlal:  article,  report,  book, 
patent,  etandard,  firm's  prospect,  draie. 
Inj;,  and  others;  mierocord,  microfilm, 
photocopy,  translation  (to  stress). 

HffORMATIOK 

Organization  putting  out  material: 

CARD  ' 

Location  of  material: 

Kty  Vbrda  and  Thalr  Cod* s 


Card  eoRipoaltlon 


Fig.  1.  Form  of  information  card,  a)  front;  b)  back. 

X 

improves  the  translation  or  list  of  descriptors.  For  editing  there 
is  a  worker  with  a  good  grasp  of  the  language  of  the  original  and 
technical  indexing-. 


In  the  preface  to  the  F.ussian-iingllsh  variant  of  the  descriptor 
dictionary  there  are  contained  methodical  indications  how  to  use  the 
dictionary  during  indexing  of  the  primary  document.  The  result  of 
indexing  is  the  retrieval  pattern  of  the  original, .l.e. ,  the  secondary 
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document.  The  rules  of  j,ndexing  documents  consist  In  sequential 
fulfillment  of  a  number  of  operations. 

1.  Professional  analysis: 

a)  analysis  of  all  forms  of  presentation  of  the  primary  document 
from  the  point  of  view  of  their  pithiness  —  title*  abstract,  thematic 
table  of  contents  (divisions),  complete  text,  and  others, 

b)  check  of  the  possibility  of  breaking  up  the  document  into 
Independent  information  units  (not  to  mix  with  rubrlcational  division 
In'subject  cataloguing), 

c)  thorough  study  of  contents  of  obtained  information  unit, 

d)  understanding  of  the  system  of  basic  ideas  reflecting  the 
contents  of  the  separated  part  of  the  document,  with  use  of  the  method 
of  logical  contrast  within  the  limits  of  paired  structure:  ’’subject  — 
about  subject”  (or  ’’subject  -  subject”),  for  example:  ’’power 
reactors”  and  ’’exploitation”).^ 

2.  Indexing: 

a)  composition  of  list  of  separated  idea-terms.  One  should 
remember  that  use  of  binary  terms  (consisting  of  determination  and 
detennlned)  promotes  high  concreteness  of  expression;  one  should  not 
use  terminological  compositions  like  ’’rim  of  wheel:  or  ’’American 
reactors,”  and  it  is  necessary  to  have  recourse  to  terminological 
unities,  for  example,  ’’boiling-water  reactors,” 

b)  finding  in  the  descriptor  dictionary  lexical  units  for 
expression  of  the  most  exact  equivalents  of  arranged  ideas  and 


^The  subject  will  be  most  frequently  exnreosed  in  general- 
technical  terms  of  the  type  ’’measurement,”  ’’breakdown,"  ’’production,” 
’’preparation,”  ’’treatment,”  etc. 
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"translation"  or  tracing  of  Ideas  on  descriptors  (in  the  form  "simple 
descriptors")  or  descriptor  syntagmas  (l.e.,  "compound  descriptors") 
using  rules  of  grammatical  synthesizing  of  a  paired  determining 
connection,  for  example,  "boiling-water  reactors"  =  "reactors  boiling*" 

3.  Coding: 

a)  coding  of  descriptors;  two-four-digit  digital  code  is  used, 
l.e.,  a  four-digit  for  the  first  component  of  the  descriptor  syntagma, 
and  another  four-digit  code -for  the  second  component.  In  the  absence 
of  an  attributive  component,  instead  of  the  second  component  there  is 
placed  four-digit  code,  consisting  of  four  zeroes  (so-called  "zero 
syntagma").  Between  both  components  of  two- four-digit  code  there  is 
placed  a  sign  of  coupling,  designated  by  a  point. 

As  can  be  seen  from  the  rules,  in  retrieval  language  serving  for 
recording  the  contents  of  a  document,  there  are  used  now  only  two 
forms  of  coupling:  —  logical  (or  "semotactical" )  —  between  syntagmas, 
l.e.,  expressed  by  opposition  "subject-theme";  -  syntactical  -  between 
components  of  a  paired  descriptor  syntagma  —  i.e.,  coupling 
"determined  —  determination"  (so-called  "postpostltlve  attributive 
syntagma"). 

Inside  the  actual  pattern  separate  descriptors  are  disposed  in 
groups  in  accordance  with  basic  elements;  subject,  theme,  and 
circumstance  (or  condition).  This  it  is  necessary  so  that  a  similar 
"telegraph"  recording  in  some  measure  replaces  or  duplicates  the 
Infor-mation  abstract.  An  exatiple  is  the  pattern  depicted  in  such  a 
way  on  applied  map  No.  250127  in  Fig.  2. 

Ill .  Processing  of  Inquiries 
1.  Form  of  Inquiry 

In  the  actual  beginning  of  creation  of  SARI-1  in  TsBNTI  there 
were  determined  consumers^  of  information,  selected  in  the  process 


^In  TsMTI  the  consumer  of  information  which  is  whole  bv  an 
ilmber?'^^"^  establishment  is  named  "object  SARI"  with  its  registration 


151 


I 


J.A.KanlcT 


me  of  alntere^  urattloB  oride  to  elestrott 

ti-MiB^mncr  hi  nelis^lnr _ _ — - - 

ilOBcacHia  TQiMHu  TaOaeroK  vo  cnwMwii  ow  TPam  AO  »jcpjoHHeiLSpo3- 
MSBOCT*  ytay-oecicy 


1 

OKT.1964 

DptroiaueitM  odpaausB  xayonei  jptia  xia  accmouaaa  aewaoa 

«B»lP*fW\UUAa 


2S0I27 


CfOtSMf.  MM- 

<,  ati***! 

ORIGINAL 

HManWiOMMHM 

1 

AX1-.TRG 

iurrA 

'  .1 

muMiaami— w  amMaM- 

1 

Library  of  Coamitte#  and  TsBNTI  | 

uraniun  oxlda 

0783.1KB 

uraniuB  dioxlda 

I83.IK8 

8ulLlact_I 

tablata 

0800. 

slntarlnK 

1022 

subjaet 

thin  fllaa 

0*%.*23d 

s'JbJaet  II 

yraoaratlon 

0874 

• 

ehanleal  polishing 

4197.0351 

subjsct  11 

stehlng 

0390 

|lAfi+.r*nri  ml  (tPOSaODV 

0696.0387 

oireuastanes; 

XCipiwvy 


Fig.  2.  Form  of  filled  Infoimatlon  card, 
a)  front;  b)  back. 


of  work  of  the  AIPS  [Automatic  IPS]  -  scientific  research  and  planning 
and  design  organizations  of  the  State  committee. 

Every  consumer  was  assigned  a  conditional  number  (from  01  ta 
N...)  and  there  was  sent  notification  about  acceptance  of  a  given 
enterprise  In  the  number  of  subscribers  SARI-1,  Instruction  according 
to  formulation  and  descriptor  description  (or  descriptor  scanning) 
Interesting  this  enterprise  of  subject.  The  Instruction  offered  the 
following  form  of  Inquiry  sent  to  the  TsBNTI. 

y 

The  usual  card,  the  dimensions  of  which  were  Indicated  beforehand, 
had  to  be  filled  from  two  sides.  One  one  side  there  was  supposed  to 
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be  written  a  so-called  "formulation  of  inquiry,"  i.e.*.,  the  name  of 
the  interesting  subject  in  natural  (Russian)  language,  and  on  the  back 
the  "descriptor  description  of  the  inquiry."  For  the  latter  there  was 
used  an  applied  descriptor  dictionary.  Assignment  of  a  "descriptor 
description,"  which  is  more  correctly  called  "descriptor  scanning," 
consisted  in  that  the  "description,"  first,  brought  to  a  group  of 
TsBNTI  inquiries  a  more  detailed  expression  of  a  subject,  and  secondly 
played  the  role  of  a  normalized  AIPS  language.  The  latter  is  very 
important  since  this  was  normalization  not  only  in  the  direction  of 
retrieval  language,  but  also  normalization  in  the  terminological 
sense. 

Experiment  showed  that  descriptor  description  of  Inquiries  by 
subscribers  turned  out  to  be  low-quality  due  to  inability  to  use 
descriptor  language  and  v'.omploxity  of  the  actual  method  of  indicative 
mood  reviewing  of  subject,  which  in  point  of  fact  is  "descriptor 
description."  For  example,  we  will  take  one  of  the  inquiries  sent; 

Formulation  of  inquiry:  "Obtaining  by  way  of  gas 
electrolysis  of  volatie  compounds  of  refractory 
metals . " 

Descriptor  description:  "Refractory  materials." 


Here  descriptor  description  does  not  definltize  the  subject  of 
Inquiry  but  expands  it,  not  even  mentioning  that  "refractory  materials 
are  not  only  metals;  the  class  of  refractory  metals  Includes  all 
metals  with  a  melting  point  higher  than  the  melting  point  of  iron. 

In  particular,  during  the  analysis  of  this  inquiry  in  TsBNTI  there 
were  16  descriptors  in  the  list:  molybdenum,  tungsten,  niobium, 
tantalum,  titanium,  vanadium,  chromium,  zirconium,  ruthenium,  rhodiiun, 
palladium,  hafnium,  rhenium,  osminum,  iridium,  platinum. 

Later  there  was  accepted  the  decision  to  establish  feedback  with 
subscribers  of  inquiries.  The  most  effective  method  of  such  contact 
is  direct  conversation  with  the  interrogator.  Started  in  1966,  this 
practice  has  not  yet  been  brought  to  completion  but  has  already  given 
good  results.  In  particular,  after  conversation  with  the  subscriber 
of  the  above-cited,  out  of  sixteen  refractory  metals  there  remained 


only  four,  which  considerably  decreased  input  volume  during  retrieval 
and,  consequently,  lowered  the  cost  of  machine  work. 

Distribution  of  information  documents  by  constant  inquiries  is 
carried  out  periodically  with  the  putting  into  the  AIPS  of  new 
information  material  immediately  after  obtaining  the  form  of  the 
inquiry.  The  form  of  inquiry  obtained  by  TsBNTI  in  book  of  constant 
demands  and  from  this  moment  becomes  as  the  "information  inquiry 
questionnaire"  of  the  given  subscriber.  In  TsBNTI  retrieval  by 
constant  inquiry  is  regular^  and,  since  the  reference  and  information 
fund  (SIF)  SARI  grows  with  the  processing  of  entering  materials  by 
a  comparatively  small  number  of  informant-indexers,  single  input  of 
documents  into  the  AIPS  cannot  be  great.  At  present  each  AIPS  gets 
300  documents  in  a  week.  The  retrieval  system  is  fed  l82  constant 
thematic  inquiries  from  a  number  of  organizations . 

2.  Study  of  Subjects  of  Inquiry 

In  connection  with  the  fact  that  it  is  necessary  clearly  to 
concre^U-  .*r.e  expression  of  the  inquiry,  the  subjects  of  the  latter 
turns  out  to  be  almost  the  most  important  aspect  of  work  during  the 
analysis  of  entering  inquiries.  During  descriptor  treatment  of  the 
inquiry  before  us,  there  came  up  not  only  the  question  of  thematic 
referredness  but  also  the  problem  of  expression  of  these  contents, 
where  the  latter  is  Immeasurably  more  important,  since  during  retrieval 
by  the  informant  there  stands  the  question  not  to  reproduce  a 
phenomenon  physically  and  even  not  to  explain  its  meaning  to  any 
person  but  to  express  it  in  the  language  of  descriptors.  "Descriptor 
description,"  sent  by  the  author  of  the  inquiry  in  this  case  only 
insignificantly  helps  during  the  analysis  of  the  subject  of  inquiry 
since  it  most  frequently  does  not  fulfill  its  assignment.  Even 
moreover:  if  "formulation  of  Inquir:,  being  surface,  often  leads 
to  error,  then  "descriptor  description"  gives  too  many  variants  in  an 
attempt  to  relate  the  Inquiry  to  a  branch. 

Inasmuch  as  the  subscriber  must  give  a  clear  account  of  the 
subject  interesting  him,  foresight  of  usual  difficulties  during 
"formulation"  and  "descriptor  description"  determined  the  necessity 
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during  selection  of  descriptors  of  access  to  the  thematic  divisions 
of  the  dictionary.  In  connection  with  this  it  is  possible  to  give 
two  case,  which  turned  out  to  be  the  most  typical. 

A.  "Formulation"  of  Inquiry  has  "Tight"  Pilling 

Example:  "Obtaining  of  different  classes  of  organic  compounds 
containing  radioactive  isotopes.  Questions  of  Introduction  of  radio¬ 
active  isotopes  in  proteins-,  amino  acid,  nukleodizy  [no  translation 
found],  nucleotides,  nucleic  acids,  vitamins,  hormones,  antibodies, 
benzene,  naphthalene,  and  their  derivatives." 

Descriptor  description 
Methods  of  obtaining 
Organic  compound 
Radioactive  isotopes 

B.  Formulation"  of  Inquiry  has  "Weak"  Filling 

Example:  "Zirconium  and  its  alloys." 


Descriptor  description 
Zirconium 

Alloys  of  zirconium 

Corrosion 

Oxidation 

Oxides 

Metallurgy 

Diagram  of  states 

Intermetallic  compounds 

Diffusion 

Mechanical  properties 

Adsorption 

Hydrogen 

Soldering 

Welding 

Heat  treatment 

Irradiation 


The  difference  between  both  examples  is  absolutely  evident,  but 
regarding  thematic  referredness ,  in  the  first  case  it  is  easier  to 
trace  with  respect  to  "formulation  of  inquiry,"  and  in  the  second  with 
respect  to  "descriptor  description."  With  such  an  approach  to  thematic 
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analysis  for  us  the  subject  of  the  theme  is  one  of  the  Ideas  selected 
from  the  formulation  of  the  Inquiry »  and  the  theme  Is  the  fact  that 
Is  expressed  by  one  of  Ideas  contained  in  the  "descriptor  description" 
and  designating  the  process  of  Introduction  of  radioactive  isotopes 
into  an  organic  compound. 

It  is  not  difficult  to  note  that  case  "B"  requires  absolutely 
the  reverse  approach.  Here  the  theme  is  the  general  content  of  the 
"descriptor  description,"  l.e.,  "metallurgy,"  and  the  subject  is 
"zirconium"  or  "alloys  of  zirconium"  (or  their  synonyms). 

From  the  point  of  view  of  retrieval  language  every  inquiry  is 
considered  an  expression  which  must  contain  a  subject  and  what  is 
said  about  this  subject,  i.e.,  its  theme.  "Representatives"  of  a 
subject  are  usually  selected  from  a  number  of  concrete,  subject 
ideas,  but  "representatives"  of  a  theme  are  expressed  by  names  of 
processes,  operations,  phenomena,  states,  or  their  sets  (in  example 
"B"  such  a  "Joint"  idea  is  descriptor  "metallurgy,"  which  causes 
determination  of  it  as  themes).  In  connection  with  this  the  knowledge 
of  a  logical  system  of  ideas  of  a  dictionary  of  retrieval  language 
is  absolutely  necessary,  or  it  must  at  least  be  grasped  intuitively, 
if  the  researcher  has  no  skills  of  fonnal  analysis  of  language.  With 
respect  to  the  two  examples  given  with  "tight"  and  "weak"  filling 
one  should  say  the  following:  during  the  shaping  of  an  inquiry 
extremes  on  neither  side  are  permissible.  An  inquiry  with  "tight 
filling  of  formulation"  is  no  longer  an  inquiry,  but  many  inquiries 
with  matching  themes  and,  consequently  is  subject  to  breaking  up 
(case  A).  An  inquiry  with  "weak  filling  of  formulation"  (case  B)  is 
an  inquiry  containing  the  subject  of  an  expression,  but  without  a 
theme ,  which  must  be  looked  for  in  "descriptor  description."  There¬ 
fore,  the  following  rules  cover  composition  of  an  inquiry: 

1)  construct  the  inquiry  in  the  form  of  an  expression  containing 
one  subject  and  one  theme  in  natural  (not  descriptor)  language; 

2)  in  the  descriptor  description  under  the  heading  "n"  enumerate 
all  descriptors  having  to  do  with  the  subject  of  expression.  Select 
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descriptors  from  dictionary,  but  new  ideas  may  be  Introduced  if 
necessary; 

3)  the  same  as  in  the  above  paragraph,  for  the  theme  headed  ’’T." 

Only  if  these  rules  are  observed,  will  an  inquiry  have  so-called 
“transparent  structure."  In  another  form  inquiries  sent  to  the  TsBNTI, 
did  not  respond  to  this  formula  but  made  it  necessary  to  conduct 
"structural  analysis  of  the  Inquiry." 

3.  Structural  Analysis  of  an  Inquiry 

Structural  analysis  emanates  from  the  preceding  procedure  of 
thematic  analysis  and  has  as  its  purpose  to  turn  the  initial  inquiry 
into  a  group  of  its  modlflcatlon-subinquired. 

The  structure  of  an  inquiry  is  expressed  in  the  form  of  a 
relation  identical  to  logic  function  "and,"  i.e.,  assumes  "slngle- 
placeness"  or  "slmqitanelty"  of  both  parts  of  the  basic  structure, 
i.e.. 


Hat. 


where  sign  A  transmits  the  "and"  relation.  Without  going  into  a 
detailed  account  of  this  question,  let  us  note  only  that  structural 
analysis  Involves  tv;0  lorms  of  actions  or  operations. 


The  flrsc  operation  is  horizontal  scanning  of  the  structural 
formula,  i.e.,  calculation  of  all  components  of  both  the  subject  and 
the  theme  of  an  inquiry.  Thus,  for  example,  for  case  "B,"  given 
in  the  preceding  paragraph,  scanning  will  have  the  following  form: 


n 


ziroonium,  zireonlun  alloys 


matallurgy,  corrosion,  diffusion, 
oxidation,  nschanlcal  propartias, 
ate.* 


Tor  tha  saka  of  nalthar  horizontal  nor  vartloal  scanning  is  oarriad  to 
completion. 
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The  second  operation  is  "vertical  scanning."  This  means  that 
from  the  thesaurus  there  have  to  be  selected  all  synonyms  of  both 
the  subject  and  the  theme.  For  example: 


n 


zirconium  (zirconium  alloys) 

1 .  zlroalloy 

2.  alloy  A-816 

3. 

4. 


A  _ _ T 

metallurgy  (corrosion)  of  oxlda 

1.  heat  treatment  -1.  of  dioxide 

2.  cold  treatment 

3.  pressing 

4. 


As  a  result  of  horizontal  and  vertical  scanning  we  have  two 
subsets  —  subset  "II"  and  subset  "T,"  and  within  the  limits  of  each  of 
them  there  acts  the  relation  of  "various-site,"  expressed  by  the  "or" 
function. 

In  the  course  of  structural  analysis  there  can  be  clarified  false 
associations  (i.e.,  combinations  not  confirmed  by  the  meaning  of  the 
inquiry  or  the  common  meaning).  This  makes  it  possible,  first,  to 
avoid  uninformative  inquiries,  and  secondly,  to  articulate  "compli¬ 
cated  inquiries"  into  sujpinqulrles . 

4.  Modification  of  Inquiry  and  "Working  Inquiry" 

Modification  of  an  inquiry  is  its  breakdown  into  variants  with 
respect  to  certain  considerations  determined  by  its  structure  or  the 
possibilities  of  descriptor  presentation.  Inasmuch  as  the  subject  or 
theme  can  be  represented  by  totalities  with  internal  "or"  relations, 
the  number  of  possible  combinations  HAT  can  be  very  great.  Thus,  for 
example,  if  there  are  10  members,  theoretically  it  is  possible  to  take 
10  X  10  =  100  modifications  (in  practice  fewer  of  them) .  An  inquiry 
entering  the  number  of  real  modifications  introduced  into  the  AIPS, 
is  called  a  working  inquiry^  in  contrast  to  the  initial  thematic 
inquiry. 


^That  is,  the  inquiry  in  that  form  in  which  it  is  introduced 
into  the  retrieval  system. 
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Indexing  of  Inquiry.  Indexing  of  Inquiries  Is  the  next  stage  of 
treatment  and  at  the  same  time  the  last  condlton  determining  quality 
of  SARI  work  and  AIPS  effectiveness. 

Indexing  can  be  conditionally  called  ’’marking  of  the  Inquired 
subject”  and  provision  of  the  Inquired  theme  with  criteria  according 
to  the  document  In  the  file  will  be  Identified  during  retrieval.  Hence, 
naturally,  descriptor  presentation  of  an  Inquiry  must  meet  certain 
requirements,  nonobservance  ■  of  which  will  lead  to  Infomation  losses 
or  Information  noise  at  the  output  of  automated  retrieval.  Noise  can 
be  uncovered  by  examining  a  set  of  issued  documents  and  comparing  it 
with  the  Inquiry,  but  this  Is  considerably  easier  than  examining  the 
whole  file. 

We  are  not  trying  to  get  rid  of  noise  altogether  but  to  minimize 
It.  The  fight  for  IPS  quality  is  a  fight  for  lowering  of  noise, 
inasmuch  as  losses  need  not  come  under  discussion  at  all  since  there 
should  not  be  any  of  them  or  at  worst  they  have  to  be  Insignificant. 

For  struggle  with  losses  In  the  order  of  the  document  list  there  are 
several  procedure  which.  In  the  end,  lead  to  attracting  paradigmatic 
connections  of  the  dictionary,  l.e.,  analyzing  connections  between 
descriptors . 

The  basic  rule  of  control  of  search  Is  the  more  Identification 
criteria  (descriptors)  In  an  Inquiry  the  less  probability  of  their 
completely  matching  the  descriptors  of  the  retrieved  document. 

The  number  of  descriptors  (l.e.,  the  depth  of  retrieval)  Is  a 
very  Important  AIPS  parameter  since  Increase  In  the  depth  of  retrieval, 
l.e..  Increase  In  the  accuracy  of  the  Inquiry  can  lead  to  losses; 
decrease  In  depth  decreases  losses  but  can  Increase  noise.  Such, 
unfortunately.  Is  one  of  the  regularities  of  probabilistic  retrieval. 
Usually  there  Is  found  (after  a  certain  number  of  experiments)  an 
average  depth  of  Inquiry.  At  the  TsBNTI  for  SARI-1  inquiries  the 
average  depth  Is  2-3  descriptor  (connected  by  an  "and”  function  for 
quantitative  calculations  —  2.5)  If  the  average  number  of  descriptors 
In  the  document  Is  15.  Thus,  full  coincidence  of  an  Inquiry  with  a 
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document  is  most  probable  when  the  number  of  descriptors  of  the  inquiry 
is  to  the  number  of  descriptors  of  the  document  as  1  is  to  6. 

The  hard  part  of  indexing  is  reaching  the  average  parameter  of 
the  inquiry  without  distorting  the  contents  of  the  inquiry.  The  fact 
is  that  only  in  structural  relation  does  an  inquiry  consist  of  two 
members  —  n  and  T.  But  a  member  of  the  syntagma  (syntactical  combin¬ 
ation)  of  an  inquiry  in  most  cases  does  not  match  an  individual  unit 
of  the  descriptor  dictionary.  More  frequently  either  member  n  and  T 
are  expressed  by  two  units,  but  if  an  inquiry  consists  of  four 
descriptors,  then  there  appears  a  threat  of  losses.  For  example: 

Inquiry:  "Economics  of  production  of  uranium  fuel" 

Structure:  H  —  (uranium,  fuel)  as  112 

T  —  (economics,  production)  as  T^,  T2 
Indexing:  Uranium  —  Fuel  —  Production  —  Economics 

As  can  be  seen  from  the  example,  the  working  inquiry  consists  of  four 
descriptors,  which  threatens  information  losses  since  there  surely 
will  be  found  a  document  in  which  information  is  contained  on  this 
subject,  and  the  descriptor  "Production"  or  the  descriptor  "Economics" 
is  absent.  Therefore,  the  inquiry  in  the  form  obtained  after  analysis 
is  modified  into  two  synonymic  constructions: 


a)  uranium  —  fuel  —  production 

b)  uranium  —  fuel  —economics 

Instead  of  one  thematic  inquiry  there  were  obtained  two  sub- 
Inquirles . 

Another  example  is  the  still  more  complicated  case  in  which 
Inquiry  density  is  so  high  that  losses  are  inevitable: 

Inquiry:  "Fast-neutron  power  reactors  with  high 

burn  up  of  fuel  and  increased  reproduction." 

Here  the  author  of  the  Inquiry  does  not  explain  Just  what  about 
this  reactor  he  needs,  and,  therefore,  we  consider- that  he  needs 
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everything  on  this  class  of  reactors,  and  consequently: 


Structure:  11^^  —  power  reactor 
112  —  fast  neutrons 

-  high  burn-up  of  fuel 

-  Increased  reproduction 
T  -  0 

Indexing:  Power  reactor  —  fast  neurons  —  burning 
out  —  reproduction 

When  only  one  member  of  the  syntagma  is  expressed  by  more  than 
two  descriptors,  it  is  possible  to  say  beforehand  that  the  probability 
of  finding  the  document  Is  Insignificant.  In  this  case  we  furthermore 
arrive  at  the  modifications 

—  power  reactor  -  fast  neutrons  -  burning  out 
M2  -  power  reactor  —  burning  out  -  reproduction 
M^  —  power  reactor  —  fast  neutrons  —  reproduction 

Besides  modification  we  usually*  use  the  so-called  ’’method  of 
substitution”  or  replacement.  This  is  possible  only  If  descriptor 
scanning  of  the  inquiry  after  Indexing  allows  according  to  the  laws  of 
descriptive  lexicology  replacing  the  ’’sum  of  Ideas”  with  one  Idea  in 
the  thesaurus.  Here  Instead  of  four  descriptors  of  scanning  at  the 
same  value  (with  distortion  of  meaning,  of  course)  substitution  (S) 
is  possible: 

^  ^1  — 

Substitution  Is  expanded  here  basically  to  ’’reactors”  and 
’’increased  reproduction  of  fuel,”  l.e,,  substitution  Is  partial .  but 
in  this  sense  the  purpose  of  introducing  variants,  subinquiries, 
modifications,  etc.,  Is  to  collect  all  partially  synonymic  expressions 
consisting  of  not  more  than  three  descriptors,  so  that  their  sum 
completely  covers  the  meaning  expressed  In  the  inquiry.  Each  of 
the  variants  can  give  and  surely  will  give  noise,  but  then  losses  are 
avoided,  which  Inevitably  appear  If  the  machine  is  fed  complete 


scanning  of  the  Inquiry.  Incidentally  it  is  necessciry  to  note  that 
such  cases  as  the  last  example  are  few.  It  remains  only  to  add  that 
substitution  is  only  an  intermediate  procedure  in  the  sense  that  the 
substitute  ( "replacement” )  can  be  combined  with  other  "unreplaced” 
scanning  descriptors  such  as,  for  example: 

M  -  breeders  —  fast  neutrons 

However,  the  number  of  modifications  is  Increased  to  such 
dimensions  that  it  is  necessary  to  remember  to  save  the  volume  of 
memory  of  input  to  the  ETsVM. 

Correction  of  Inquiry.  Everything  said  above  brings  us  to  the 
final  stage  of  processing  inquiries  —  correction  of  working  inquiries 
with  respect  to  analysis  of  feedback.  Inasmuch  as  AIPS  effectiveness 
is  estimated  as  a  percentage  of  noise  information  and  information 
losses,  output  (delivery  of  documents)  AIPS  gives  to  us  that  material 
which  contains  data  for  corrections  of  inquiries  on- any  of  the 
enumerated  stages  of  treatment.  It  is  necessary  to  note  that  error 
in  the  inquiry  or  its  Inaccuracy  on  the  most  first  stage  (presentation 
of  the  inquiry  by  the  subscriber)  does  not  depend  on  the  work  of  the 
group  of  TsBNTI  inquiries  and  they  are  removed  only  during  conversa¬ 
tion  with  the  subscriber  during  personal  contact.  Obviously,  taking 
measures  to  remove  error  of  this  type  is  very  important,  since  if 
error  is  allowed  in  the  very  beginning  of  the  process  of  work,  the 
error  will  spread  with  passage  through  subsequent  phases  of  treatment, 
where  its  dimensions  will  be  greater  the  more  Intermediate  operations 
there  are  up  to  the  moment  of  input. 

Regarding  the  correction  of  inquiries  on  remaining  phases  of 
treatment,  we  do  this  regularly  with  the  accumulation  of  material  by 
way  of  study  of  output  tabulograms  and  comparison  of  output  data  with 
the  retrieval  array  and  inquiries  in  a  given  stage  of  treatment. 

5.  Evaluation  of  Informative  "Weight" 

After  the  above  it  is  easy  to  see  that  formulation  of  an  inquiry 
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Is  subordinated  tOy  the  language  system  on  the  one  hand  and  the  system 
of  descriptor  retrieval  on  the  other.  This  means  that  all  operations 
a06©fflp3:igh§d  with  4ngwi?ieg  in  peint  ©f  faet  ©f  the  meet 
Important  methods  of  controlling  the  retrieval  process.  But  to  keep 
down  noises  and  losses  In  TsBNTI  retrieval  subsequently  there  Is  to 
be  used  one  of  the  procedures  —  "evaluation  of  informative  weight"  — 
of  the  descriptors  of  the  Inquiry  which  consists  in  the  following. 

Ideas  Introduced  into  the  group  of  Inquiry  descriptors  possess 
various  degree  of  lnfoi*matlon  recognizedness.  Of  all  the  descriptors 
some  have  very  direct  linage  with  the  inquiry  (subject  or  theme), 
others  have  mediated  linkage,  and  a  third  group  is  so  weak  what  a 
descriptor  can  be  considered  optional  (i.e.,  both  its  presence  and 
its  absence  in  the  inquiry  is  grasped  as  controversial).  Now,  if 
we  dispose  all  inquiry  descriptors  by  method  of  "diminishing' criterion" 
(the  so-called  "gradual  series"),  then  in  the  first  place  we  have  the 
most  "infonnatlve"  descriptor  and  in  the  last  the  least  informative, 
for  example: 

Inquiry:  Investigation  of  the  contents  of 
strontlum-90  in  the  atmosphere. 

Descriptor  description:  1.  strontlum-90 

2.  atmosphere 

3.  contents 

4.  investigation^ 

Gradual  series:  stronti’jm-90  —  atmosphere  — 
contents  —  investigation 


The  series  contains  four  components.  If  we  assume  that  the  number  of 
gradations  must  not  exceed  four,  as  in  this  case,  then  we  could 
"evaluate"  the  informativeness  of  every  descriptor  by  its  place  in  the 
series,  designating  thereby  its  informative  valence  or  so-called 
"informative  weight"  (U). 

y  1.  strontium-90,  U-4 

2.  atmosphere,  U-3 


^For  simplicity  of  account  we  take  the  "ideal"  descriptor 
description,  omitting  analysis,  expounded  in  the  preceding  sub¬ 
divisions. 
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3.  contents,  U-2 
h.  investigation,  U-1 


The  total  sum  of  weight  is  10  units  of  valence.  Taking  this 
number  as  the  initial  constant,  we  can  assume  that  correct  delivery 
of  information  is  within  limits  from  7  to  10  (or  in  other  limits 
which  are  set  by  a  series  of  checks  of  statistical  order).  It  is  not 
difficult  to  note  that,  setting  the  optimum  number  at  9>  we  manage 
only  without  descriptor  ’'investigation,”  which  in  fact  insignificantly 
affects  the  outcome  of  retrieval.  Thus,  we  will  obtain  one  more 
method  of  controlling  retrieval. 

IV.  The  Structural-Functional  Diagram  of  the  SARI~1 

The  SARI-1  AIPS  is  a  descriptor  retrieval  system  working  on 
dlfferentlatlonal  distribution  of  information.  This  means  that  a 
set  of  inquiries  and  the  basic  principle  of  the  work  process  consists 
in  series  comparison  of  "descriptor  descriptions”  of  documents  with 
"descriptor  presentations”  of  inquiries.  The  result  in  the  form  of  a 
tabulogram  indicates  that  a  given  inquiry  corresponds  to  a  given 
subset  of  documents.  A  tabulogram  is  a  list  of  digital  codes  of  a 
finite  set  of  inquiries  which  registers  the  subset  of  the  retrieval 
(l.e.,  identification)  numbers^  of  those  documents  the  descriptor 
description  of  which  completely  Includes  the  descriptor  presentation 
of  the  given  inquiry. 

In  the  expounded  meaning  SARI-1,  as  an  AIPS,  belongs  to  a  class 
of  retrieval  systems,  i.e.,  during  work  accessing  an  array  of  documents 
consists  in  two  phases  of  the  process: 

—  comparison  with  a  finite  set  of  inquiries  and 

-  dlfferentlatlonal  distribution  in  linkage. 


^Universal  addresses  in  general. 

/ 
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An  AIPS  is  a  basic  functional  organ  of  the  SARI-1  and  occupies 
in  it  a  place,  which  can  be  characterized  as  a  "subsystem  in  a  system," 
although  the  property  of  these  two  systems  by  far  not  are  Identical. 

As  shown  above,  SARI  is  differentiatlonal  distribution  of  information 
in  which  it  was  decided  to  use  the  "Mlnsk-22"  as  technical  means 
playing  the  role  an  automatic  retrieval  device.  Automation  of  such 
distribution  was  recognized  as  expedient  because  at  the  TsBNTI  there 
was  set  the  problem  of  processing  a  large  Information  fund.  Let  us 
enumerate  basic  functioning  parts  of  SARI. 

The  flow  of  information  in  the  form  of  reports,  articles, 
abstracts,  etc.,  is  sent  to  the  library  of  the  information  fund. 

Processing  of  information  (primary  document  "PD")  for  the  purpose 
of  presenting  it  in  a  form  convenient  for  retrieval  from  the  point  of 
view  of  AIPS  requirements  (secondary  document  "VD")  is  carried  out  in 
the  division  of  the  reference  and  information  fund  by  an  indexing 
group  in  which  all  necessary  data  are  introduced  into  the  card  in 
the  form  of  formal  criteria  —  indices  (retrieval  numbers,  addresses, 
descriptors  with  digital  codes,  and  others). 

Form  VD  is  an  information  card  (form  No.  1).  Removal  of  data 
from  the  secondary  document  (Indices)  to  the  retrieval  sample  of  the 
document  (POD),  for  example  to  a  punched  card,  punched  tape,  or 
magnetic  tape  (depending  upon  input  method),  is  carried  out  in  the 
punching  group  of  the  section  of  mechanization  of  information  processes. 
Reception,  analysis,  processing,  and  modification  of  inquiries  and 
their  conversion  into  VD  form  are  carried  out  in  the  group  of  inquiries 
of  the  section  of  the  reference  and  information  fund. 

The  SARI  structure  also  Includes: 

A  network  of  SARI  objects  —  subscribers  of  information  service. 

On  the  first  stage  we  were  limited  to  a  comparatively  small  number  of 
subscribers  -  the  leading  scientific  and  engineering-technical  workers 
of  scientific  research  institutes  and  design  bureaus  of  the  basic 
thematic  directions  of  our  branch  of  science  and  technology.  The 
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number  of  such  thematic  inquiries  (or  "inquiry-subscribers")  obtained 
at  the  TsBNTI  was  182. 

A  network  of  peripheral  Information  services  directing  information 
documents  (PD  and  or  VD)  to  the  central  fund.  In  the  first  place  It 
was  decided  to  deposit  In  the  fund  reports  of  foreign  centers  and 
firms  worklrig  in  the  field  of  atomic  science  and  technology,  and  also 
open  reports  of  scientific  research  institutes  and  design  bureaus  of 
the  Glavatom  [Main  Administration  for  the  Use  of  Atomic  Energy]  system. 

Solution  of  problems  of  automated  information  retrieval  (AIPS)  — 
development  of  algorithm  and  program  and  practical  fulfillment  of 
retrieval  —  is  carried  out  by  the  division  of  mechanization  and  auto¬ 
mation  of  information  processes  (OMAIP)  of  the  TsBNTI  and  the  computer 
center  of  the  Central  Statistical  Administration  of  the  RSFSR,  where 
we  are  renting  a  "Mlnsk-22." 

The  processing  of  output  tabulograms ,  which  are  "output  pro¬ 
duction"  of  AIPS,  at  present  is  carried  out  by  a  special  "group  of 
processing  of  output"  of  AIPS. 

Feedback.  A  copy  of  the  information  card  found  according  to  the 
Inquiry  is  sent  to  the  subscriber  with  a  special  breakaway  coupon 
("stub")  containing  a  mark  characterizing  the  reaction  of  the  buyer 
for  the  document  in  the  fund,  for  example:  "The  document  sent 
Interests  him  a)  completely,  b)  partially,  c)  information  on  the 
given  question  is  obtained  for  the  first  time,  d)  material  is  known" 
or  "He  needs  a)  abstract,  full  material;  b)  original  microfilm  or 
microcard"  or  "The  material  obtained  does  not  correspond  to  the 
inquiry."  The  feedback  coupon  is  sent  back  to  the  TsBNTI  where  part 
of  them  —  with  positive  answer  —  goes  to  the  group  of  distribution, 
and  part  —  with  negative  answer  —  enters  the  group  of  inquiries  for 
analysis  of  unsatisfactory  output  and  correction  of  input. 

Such  is  the  structural-functional  diagram  (Pig.  3).  As  can  be 
seen  from  the  diagram,  the  function  of  separate  sections  coincide 
not  by  form  or  assignment  of  material  (report  or  information  card. 


X 
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Fig.  3.  Diagram  of  SARI-1  work. 

KEY;  (a)  Printing  material;  (b)  Domestic  and  foreign 
technical  documentation;  (c)  Plow  of  Information;  (d) 
Removal  of  copies;  (e)  Microfilms;  (f)  Mlcrocards;  (g) 
Indexing  of  technical  documentation;  (h)  Illegible; 

(1)  Pile  of  Information  cards;  (j)  Sampling  of  Infor-  i 
mation  card  by  found  numbers;  (k)  Stub  of.  Informational 
Card  (feedback);  (1)  Copying  of  information  cards;  (m) 
File  of  punched  Information  cards;  (n)  Retrieval  on  the 
'*Mlnsk-22'* ;  (o)  Recording  of  inquiries  on  punch-cards; 
(p)  Processing  and  checking  of  Inquiries;  (q)  User; 

(r)  Inquiries;  (s)  Characteristic  of  experimental  re¬ 
trieval;  (t)  1.  Number  of  simultaneously  processed 
documents.  2.  Number  of  simultaneously  processed 
Inquiries.  3.  Machine  time  of  processing  300  Inquiries 
during  retrieval. 


as  form  VD;  demand  or  information  material,  etc.),  by  its  presentation 
by  a  method,  and  by  methods  of  Its  processing. 


In  the  beginning  of  this  division  we  talked  about  "descriptor 
description  of  a  document"  and  "descriptor  presentation  of  an  Inquiry." 
The  meaning  of  these  two  Ideas  Is  that  Information  materials  and 
Inquiries  can  equally  be  considered  two  varieties  of  document,  of 
which  one  Is  recorded  in  natural  language,  and  the  second  of  which  Is 
processed  in  digital  codes  for  moving  retrieval  data  to  the  retrieval 
sample  of  the  document. 


The  secondary  form  of  the  Inquiry  somewhat  differs  fromw'^he 


secondary  form  of  the  Information  document.  The  difference  Is  that 
while  an  Inquiry  Is  being  processed  It  (as  a  rule)  has  no  bibliographic 
and  factographlc  data]  therefore,  the  corresponding  secondary  form 
contains  only  an  Identification  number  and  codes  of  descriptors 
expressing  the  theme  Inquired  about  (see  the  section  "Processing  of 
Inquiries"  In  greater  detail). 

V.  Machine  Realization  of  SARI-1 

1.  Introduction  Into  System.  Form  of  Presentation 
of  Primary  Information 

Documents  presenting  a  volume  of  current  Information  on  the  one 
hand,  and  constant  Inquiries  reflecting  the  Interest  of  users,  on 
the  othef  hand,  are  puV  into  the  ayafcem. 

The  abstract  of  the  document  In  the  form  of  code-descriptors  Is 
recorded  In  an  Information  card.  Descriptors  are  described  In  elght- 
and  four-digit  digital  codes,  where  the  first  main  part  of  the  eight- 
digit  code  appears  as  an  Independent  code  on  the  Information  card: 

0746.0000 

0746.9017 

For  Input  to  the  "Mlnsk-22"  the  contents  of  the  Information  map 
(registration  number  and  code-descriptors)  Is  recorded  on  punched 
cards  In  the  decimal  system.  Every  punched  card  contains  9  descriptors, 
and  the  average  number  of  descriptors  of  a  document  Is  20,  consequently, 
on  every  document  2  (two)  punched  cards  are  recorded.  Both  copies  of 
punched  cards  of  one  document  contain  one  registration  number  of  the 
document.  (The  program  provides  for  recording  documents  on  punched 
tape  when  necessary.)  The  system  Is  fed  455  punched  cards  or 
approximately  270-300  simultaneously. 

Constant  Inquiries  of  users  are  broken  up  Into  subinquiries. 

Every  Inquiry  Includes  not  more  than  40  subinquiries.  An  Individual 
subinquiry  contains  2-3  descriptors  and  Is  recorded  for  punch-card 
Input.  The  machine  Is  simultaneously  fed  a  group  of  300  subinquiries 
(punched  cards).  Formed  and  deflnltlzed  groups  of  Inquiries  are 

recorded  on  magnetic  tapes. 


168 


An  auxiliary  program  provides  for  printing  a  punched  array  of 
punched  cards  of  documents  (inquiries)  and  the  obtaining  of  the  check 
sum  on  a  given  group  of  cards.  Printed  data  permit  restoring  punched 
card  in  case  of  loss  or  damage. 

Unit  I  of  the  main  program  provides : 

1)  input  of  documents  to  the  ”Mlnsk-22"  (punched  card, 
punched  tape). 

2)  fegiiiatif.g  ef  d§§erlptef§  wlfchih  §v§f»y  deeumenti 

3)  creation  of  information  tables  ahhut'  documents  —  len^h' 
of  retrieval  pattern  of  every  document  and  initial  address  of  its 
location  in  MOZU  [Magnetic  working  storage]  (Form  see  Appendix). 

Unit  II  provides: 

1)  input  of  inquiries  to  the  machine  (punched  cards, 
magnetic  tape), 

2)  regulating  of  descriptors  within  each  inquiry, 

3)  creation  of  information  tables  about  inquiries  —  length 
of  retrieval  instruction  of  every  inquiry  and  initial  address  of  its 
location  in  MOZU  (Form  see  Appendix). 

2.,  Storage  of  Information  in  Machine. 

Accumulation  of  Information 

Provision  is  made  for  a  direct  form  of  storage  of  information  in 
the  machine:  recording  presents  a  sequence  of  descriptors  located 
behind  the  document  number 


+58001 M79  +58001 1480 

+000560000  +000790000 

+OOOS6IS33  +002600000 
+001650000  +003020000 

+001900000  +003020108 

+006290000  +004760000 

+006570000  +004890000 

+008240000  +004899198 

+008640000  +006900000 

+009930000  +009140000 

+010270000  +009140108 

.  +012040000  +016520000 

+012040993  +060530000 

+OIS4SOOOO 
+016840000 
+060630000  . 


Every  number  and  code-descriptor  goes  with  a  37-bit  cell.  They 
are  stored  in  binary-decimal  code.  The  difference  between  a  code¬ 
descriptor  and  the  code  of  a  document  number  is  that  there  is  a 
5800  or  a  minus  sign  (-)  in  the  highest  bit  of  the  latter. 


On  operational  storage  of  punched  cards  of  documents  for  retrieval 
there  are  assigned  4096  cells  (words)  of  unit  II  of  MOZU.  On  opera¬ 
tional  storage  of  inquiries  for  retrieval  there  is  assigned  half  of 
unit  I  of  MOZU  -  2000  words;  documents  and  inquiries  are  located  in 
order  of  input.  The  necessary  input  condition  is  an  array  of  punched 
cards  regulated  by  numbers  of  documents  (inquiries).  Provision  is 
made  for  accumulation  of  information  on  punched  cards . 

3.  Method  of  Retrieval 

Use  is  made  of  a  simplified  algorithm  based  on  simple  comparison 
of  descriptors  of  inquiry  and  document.  By  the  table'  of  information 
of  inquiries  (T.Z)  there  is  selected  the  first  subinquiry,  and  in 
sequence  by  the  table  of  information  of  documents  the  retrieval 
Instruction  of  the  inquiry  is  crossed  with  the  retrieval  pattern  of 
every  document.  The  resultant  list  of  crossing  is  compared  with  the 
inquiry  and  there  is  carried  out  echeloning  of  the  delivery  of  the 
answer. 

Form  of  delivery  of  answer; 

1.  If  all  (n)  descriptors  of  an  Inquiry  match  all  or  some 
descriptors  of  a  document,  the  number  of  the  given  document  is  issued 
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in  the  1st  echelon. 


2.  If  n-1  descriptors  of  an  inquiry  match  all  or  some  descriptors 
of  a  document,  the  number  of  the  document  is  issued  in  the  2nd  echelon. 

3.  If  n-2  descriptors  of  an  inquiry  match,  the  number  of  the 
document  is  Issued  in  the  Srd  echelon.  (Program  provides  for  cutoff 
of  the  2nd  and  Srd  echelons  through  a  key;  thus  during  exploitation 
of  the  system,  a  3rd  echelon  providing  for  the  matching  of  n-2 
descriptors  of  the  inquiry  with  the  descriptors  of  the  document  ceases 
to  be  necessary. ) 


4.  Delivery  of  Information 

Retrieval  results  are  printed  on  an  ATsPU  (alphameric  printer). 

In  the  corresponding  columns  in  the  binary-decimal  system  there 
are  printed: 

a)  the  number  of  the  inquiry  and  its  retrieval  pattern 
(descriptors ) ; 

b)  numbers  of  documents  by  echelons  of  delivery  and  .descriptors 
of  documents  which  matched  the  descriptors  of  the  retrieval  pattern 
of  the  inquiry. 


5.  Reproduction  and  Distribution  of  Documents 

As  can  be  seen  from  the  flow  chart  (Pig.  3),  after  the  tabulogram 
has  been  processed,  from  the  fund  there  are  selected  information  cards 
corresponding  to  the  inquiry.  They  are  manually  selected  from  the 
fund  (cards  are  decomposed  in  order  of  retrieval  numbers).  Removed 
cards  are  exa-iiined  by  a  specialist  for  evaluation  of  contents  (to 
decrease  noise). 

Information  carls  selected  for  the  user  are  sent  to  an  "ERA" 
installation  for  reproduction.  From  the  crlglnal-mock-up  there  are 
prepared  on  the  "ERA"  two  forms  (front  and  back),  from  which  there  are 
reproduced  impressions  of  blanks. 
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After  obtaining  a  copy  of  the  Information  card  with  a  feedback 
stub,  the  latter  are  distributed  to  information  users.  Prom  the 
information  user  "stub-cards,”  "feedback"  enter  the  group  of  inquiries 
where  the  Inquiry  is  corrected  —  in  case  of  a  negative  answer  — 
materials  are  selected  lor  reproduction.  The  order  and  originals  are 
transmitted  to  the  microfilming  laboratory  or  to  the  section  of 
operational  reproductio’*',  ^  the  form  of  works  which  must  be  conducted. 
Microfilming  is  done  on  a  'uDK-2. 

Laboratory  technology  has  been  developed  at  the  TsBNTI  for 
removal  of  copies  from  microcards  and  documents. 

Single  copies  are  removed  by  means  of  electrography .  Copies 
from  small  individual  materials  are  taken  by  means  of  electrography 
on  the  "ERA"  (for  sheet  materials)  and  the  "Electrophot"  for  stitched 
materials. 

During-  the  time  of  field  testing  of  the  SARI-1  .system  during  1.5 
years,  about  2000  originals  were  reproduced  from  stubs. 

VI.  Determination  of  Effectiveness  of  Automated 
and  Differentlatlonal  Distribution 
of  Information  (SARI-1)' 

y 

We  analyzed  a  comparatively  snail  array  of  documents i  obtained 
data  permitted  reaching  a  simple  conclusion  with  respect  to  ways  of 
improving  the  investigated  system. 

The  basis  of  the  given  method  of  calculation  is  the  method  of 
A.  V.  Sokolov.  According  to  this  method  parameters  characterizing 
the  accuracy  of  IPS  work  are  loss  of  useful  information  L  and  infor¬ 
mation  noise  NJ?,  which  are  determined  by  the  corresponding  formulas: 
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where  is  the  number  documents  Introduced  Into  the  IPS  completely  or 
partially  satisfying  a  certain  inquiry  (containing  useful  information 
on  a  given  Inquiry);  S  is  the  number  of  documents  Issued  by  the  IPS 
containing  useful  information  (part  from  s*)*-  N  is  the  number  of 
documents  Issued  by  the  IPS  not  containing  useful  information  on  a 
given  inquiry  (information  noise);  F  is  the  total  delivery  of  IPS  on 
a  given  inquiry  (S  +  N). 

Thus,  S/Sj,  characterizes  completeness  of  delivery,  and  N  accuracy 
of  delivery.  Completeness  and  accuracy  of  delivery  are  basic 
characteristics  determining  effectiveness  of  IPS  work. 

The  values  S,  P  and  N  in  tables  giving  analysis  of  delivery  of 
documents  by  the  system  (Tables  1-5)  are  represented  in  the  form  of 
fractions  the  numerators  of  which  are  the  number  of  documents  Issued 
by  the  system  in  the  1st  echelon,  and  the  denominators  of  which  are 
the  number  of  documents  issued  by  the  system  in  the  2nd  echelon. 

The  criterion  of  delivery  of  the  document  in- the  1st  echelon  is  the 
presence  in  descriptor  description  of  this  document  of  all  descriptors 
of  the  inquiry,  in  the  absence  of  one  inquiry  descriptor  in  the 
descriptor  description  of  the  document  this  document  is  Issued  by 
the  system  in  the  2nd  echelon. 


Table  1.  Inquiry  12001.  Power  reactors  with  nuclear 
superheating  of  steam:  construction,  parameters, 
operational  experience  (energy  reactors,  power 
installations,  overheating,  nuclear  overheating, 
superheaters)  S^.  =  8. 


«/* 

Descriptor  description 

5 

AT 

Calculations  of  losses 
and  noises 

1. 

power  reactors  ♦  overheating 

0/S 

0/32 

0/24 

£.0;  A/e^— 7S>/. 

2. 

power  plant  ♦  overheating 

0/7 

0/U 

0/4 

t-il2.6*A'  JV*/*-36.4^4 

3. 

power  reeotors  *  nuclear  over-, 
heating 

0/0 

0/21 

0/21 

L-100%;  A/V.-100>/. 

4. 

plants  ♦  nuclear  overheating 

0/0 

0/3 

0/3 

L-ioos:  Ar*/«-ioo*^ 

S. 

power  reactors  ♦  super¬ 
heaters 

0/8 

0/8 

0/0 

L-0;  WS-O 

6. 

power  plants  ♦  superheaters 

0/8 

0/8 

0/0 

L.O:  AT^-O 

Altogether 

0/8 

0/32 

0/24 

L-O;  Ar^-i76*/e 
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Table  2.  Inquiry  12011.  Operational  experience 
of  boiling-water  reactors  (boiling-water  reactors, 
exploitation,  reactors,  boiling)  Sj.  =  9* 


Calculations  of 

Descriptor  description . 

losses  and  noises 

1. 

boiling-water  reactors  *j 

t-85.7V. 

♦  exploitation 

0/1 

0/16 

0/15 

Af%-93.7S*/. 

jL-7I.4*/« 

2. 

boiling  ♦  reactors  ♦  ex-' 
ploltation 

0/1 

0/2 

0/16 

0/22 

0/15 

0/20 

Ars-90,d*/. 

Table  3.  Inquiry  12019.  Zirconium  and  its 
alloys  (zirconium,  zircalloy,  alloys  of  zir¬ 
conium;  Sj,  =  12. 


a/a 

Descriptor  description 

s 

H 

Calculations  of 
losses  and  noises 

1. 

zirooniuin 

4/0 

14/0 

10/1 

A/»/,-71.4*/. 

2.  • 

alloys  of  zirconium 

6/0 

6/0 

0/0 

£-S0»/,;  VV,-.0: 

3. 

zircalloy 

3/0 

8/0 

5/0 

/,-75»;  W.-62,5»y4 

12/0 

27/0 

15/0 

£,.0;  8/K-55.6*/* 

Table  4.  Inquiry  12029.  Diffusion  of  products 
of  fission  of  uranium  dioxide  at  high  temperature 
(uranium  dioxide  —  fission  products  —  diffusion  -• 
high  temperature)  S^,  =  5- 


*/> 

Descriptor  description 

5 

P 

Caloulfftlons  of 

Xo089B  and  nolsos 

1. 

dv.  [unknown]  uranium  ♦  | 

fission  products  *  diffu-  ' 
Sion  ♦  high  temperature 

1/2 

1/2 

0/0 

£.—40%.  JV=0 

1/2 

1/2 

0/0 

£.«i40*/..  /V—0 

Table  5-  Inquiry  2500 j.  Processing  and  burial 
liquid  waste  at  an  AES  [Atomic  Electric  Power 
Plant]  and  enterprises  (liquid  waste,  removal  of 
waste,  waste  treatment)  Sj,  =  20. 


a/a 

Descriptor  description 

5 

P 

H 

Calculations  of 
losses  and  noiss 

1. 

liquid  waste  ♦  removal  of 

waste 

0/17 

0/19 

0/2 

ti-15%.  8/- 10,5% 

2. 

liquid  waste  ♦  waste  treat¬ 
ment 

0/13 

0/14 

0/1 

£,-35%.  Ar-7,14% 

0/19 

0/21 

0/2 

£.—5%.  A/— 9.5% 
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On  the  glveiv  Inquiry  the  system  has  no  losses  and  24  documents 
not  satisfying  It  are  Issued  due  to  the  presence  In  their  descriptor 
description  of  the  descriptor  "power  reactors." 

Twenty  documents  not  satisfying  the  Inquiry  are  Issued  by  the 
system  on  the  descriptor  "exploitation"  on  the  1st  modification  and 
the  descriptors  "exploitation"  and  "reactors"  on  the  2nd  modification. 

Of  nine  relevant  documents  only  two  are  Issued,  and  retrieval 
numbers  06209,  06680,  07610,  09097,  09098,  09219,  and  09223  are  lost. 

Direct  comparison  of  the  descriptor  descriptions  of  the  Inquiry 
and  documents  mentioned  above,  permits  uncovering  the  cause  of  loss 
of  these  documents  by  the  system. 

In  all  the  above-mentioned  documents  In  the  descriptor  description 
there  Is  shown  the  descriptor  "boiling  water"  and  separately  "reactor," 
In  the  descriptor  description  of  the  Inquiry  and  Its  modifications 
there  are  shown  the  descriptors  "bolllng-water  reactor"  and  "boiling," 
"reactor. " 

The  same  fact  or  phenomenon  Is  described  by  Indexers  of  Inquiries 
with  one  bunch  of  descriptors,  and  by  Indexers  of  documents  by  others. 
Therefore,  one  should  Introduce  an  understanding  with  respect  to 
application  in  descriptor  descriptions  of  the  document  and  inquiry 
of  the  synonymic  construction,  most  completely  responding  to  the 
meaning;  this  understanding  should  be  reflected  In  the  Instruction 
with  respect  to  indexing  of  documents  and  Inquiries. 

The  descriptor  description  of  Inquiry  12019  assumes  delivery  of 
material  by  the  formula  "all  on  zirconium  and  Its  alloys."  If 
Issued  documents  are  examined  from  this  point  of  view,  then  all  27 
documents  should  be  considered  relevant,  and  Information  noise  equal 
to  zero.  However,  among  these  27  documents  there  are  documents  in 
which  the  percentage  of  useful  information  on  a  given  Inquiry  is  small 
with  respect  to  the  complete  Information  of  the  document.  An  example 
Is  a  report  on  144  pages  with  retrieval  number  06215,  where  the 
question  Is  experiments  on  a  reactor  with  llquld-metal  fuel,  and  in  the 
descriptor  description  there  Is  the  descriptor  "zirconium." 
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Two  documents  are  lost.  The  cause  of  the  loss  of  these  documents 
is  analogous  to  that  which  was  described  in  examining  inquiry  12011, 
i.e.,  different,  forms  of^  recording  In  the  document  and  in  the  inquiry 
of  two  identical  relations  of  ideas.  Thus,  in  document  No.  09092  in 
the  descriptor  description  there  is  shown  the  descriptor  "thermo¬ 
diffusion”  and  in  document  No.  09268  there  is  shown  the  descriptor 
"liberation";  in  the  Inquiry  there  appears  only  the  descriptor 
"diffusion."  Thus,  during  modification  of  the  inquiry  inaccuracy  is 
allowed,  and  the  descriptors  shown  in  the  documents  "thermodiffusion" 
and  "liberation"  must  appear  in  modifications  of  the  inquiry. 

For  analogous  reason  on  inquiry  25005  there  is  lost  the  document 
with  exploration  number  065^0:  "processes  'of  separation"  —  "v;aste 
treatment,"  "radioactive  waste"  —  "liquid  waste"  —  various  forms  of 
recording  of  two  ideas  close  in  meaning. 

Conclusions 


There  is  developed  a  system  of  automatic  discrlbution  of 
information  by  constant  (on  duty)  inquiries  of  users  (developers  and 
scientists)  from  a  constantly  supplemented  fund  of  information 
documents  (reports  of  foreign  centers  and  firms  and  dometlc  scientific 
research  organizations). 

The  basis  of  treatment  of  information  documents  is  a  descriptor 
dictionary  on  subjects  of  the  branch,  containing  ^1000  basic  terms  and 
ideas . 

The  retrieval  patterns  of  the  parts  of  the  document  isolated  in 
meaning ■ (Information  blocks)  are  transferred  on  information  maps, 
bearing  besides  a  list  of  descriptors  universal  addrec.-es  and  retrieval 
numbers  and  a  bibliographical  description  and  annotation  (abstract) 
of  the  corresponding  part  of  the  document. 

✓ 

To  decrease  of  noise  during  retrieval  into  the  descriptor 
description  there  are  introduced  grammar  elements  (separating  of 
object  and  theme  and  definition).  To  decrease  losses  special  attention 
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Is  payed  to  the  composition  of  inquiries.  The  inquiry  formed  by  the 
user  is  divided  into  subinquiries  taking  into  consideration  necessary 
hierarchical  and  associative  li’.iks. 

y 

The  descriptors  of  the  dictionary  are  united  in  semantic  groups  - 
semantic  fields  allowing  replacement  of  synonyms,  terminological 
groups,  and  the  idea  above,  below  and  next  to.  For  further  more 
precise  definition  of  inquiries  there  is  used  feedback  with  the  user 
(coupons  and  personal  conversations). 

Retrieval  is  realized  with  the  help  of  the  ”Minsk-22"  on  the 
basis  of  rent  of  machine  time  by  way  of  direct  comparison  of  the 
descriptors  of  the  set  of  subinquiries  Introduced  into  memory  with 
descriptors  of  a  portion  of  documents  Introduced  into  the  machine. 

Results  of  field  testing  of  the  system  showed  that  noise  accounts 
for  about  70%  during  insignificant  losses. 

During  further  finishing  of  the  system  provision  is  made  for 
strengthening  grammar  and  introducing  weight  categories  of  individual 
descriptors  into  inquiries. 
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APPLICATION  OP  DOMESTIC  PUNCH-CARD  MACHINES  AND  THE 
"MINSK^22’'  FOR  COMPOSITION  OP  A  DESCRIPTOR  INDEX 

A.  I.  Nadtochly,  L.  N.  Krasovskaya,  L.  P.  Martynov, 

and  N.  N.  Mikheyev 

In  the  work  of  Information  services  at  present  for  operational 
information  and  information  retrieval  they  find  wide  application 
indices  of  Information  materials  composed  on  the  basis  of  key  words  of 
context  of  the  title  ("key  words  in  context”  system  KWIC),  and  indices 
on  the  basis  of  key  words  outside  the  context  (system  KWOC) .  The 
latter  form  of  indices  started  to  be  applied  abroad  relatively 
recently,  for  the  purpose  of  Increasing  the  effectiveness  of 
application  of  indices  [1].  Indices  of  the  KWOC  type  are  considerably 
more  convenient  in  work  than  indices  of  the  KWIC  type,  since  they 
facilitate  examination  and  retrieval  of  sources  of  Information 
Included  in  the  index. 

Indices  of  a  similar  type  permit  considerably  reducing  periods 
of  bringing  information  to  users  makes  it  possible  to  detect  documents 
from  input  data  —  key  words  or  descriptors  —  more  rapidly. 

Indices  of  the  type  of  key  words  found  wide  application  in 
foreign  atomic-energy  centers.  For  example,  Physindex  [7]  is  the 
index  of  information  materials  put  out  by  the  Commissariat  on  atomic 
energy  of  Prance,  system  Gipsy  [8],  developed  by  the  International 
agency  on  atomic  energy,  and  others. 
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Such  Indices  are  used  for  various  Information  materials  (list  of 
dissertations  in  National  laboratory  in  Oakridge  (the  United  States) 
and  others)  [9]. 

During  solution  of  the  problem  of  mechanized  data  processing 
and  index  of  the  descriptor  type  was  selected. 

Selection  of  a  descriptor-type  index  similar  in  form  to  KWOC 
was  caused  by  the  available  .program  of  creation  of  a  fund  in  Inverted 
recording  for  IPS,  realized  at  the  TsBNTI  on  the  ’'Minsk-22.'’  Further¬ 
more,  a  descriptor-type  index  as  compared  to  an  index  composed  on  the 
basis  of  key  words  of  the  title  of  documents  is  based  on  deep  indexing 
of  documents  by  descriptors  and,  as  one  should  expect,  should  ensure 
high  probability  of  finding  necessary  infomatlon  materials. 

Besides  what  has  been  mentioned,  during  selection  of  an  index  of 
the  descriptor  type  there  was  taken  into  consideration  the  fact  that 
indices  reduce  consumption  of  paper  as  compared  to  card  files  and 
improve  access  to  the  fund  during  retrieval. 

The  General  Flow  Chart  of  Composition 
of  a  Descriptor  Index 

A  descriptor  index  is  composed  in  the  following  sequence.  On 
every  document  there  is  composed  an  information  card  of  the  abstract 
type.  To  be  more  exact  there  is  used  an  information  card  composed 
for  the  "SARI-1"  information-retrieval  system  exploited  in  the  Central 
Bureau  of  Scientific  and  Technical  Information  on  Atomic  Science  and 
Technology . 

A  report  on  this  system  was  presented  at  the  present  conference. 
The  information  card  contains  a  retrieval  pattern  in  the  form  of  a 
number  of  descriptors  and  the  bibliographical  characteristic  of  the 
document  (name,  author,  volume,  number,  page,  location  of  document, 
etc.).  Primary  documents  are  processed  using  a  dictionary  of 
descriptors  developed  at  the  TsBNTI.  Information  cards  composed  from 
available  materials  go  to  punch.  Punching  —  transfer  of  data  from 
the  information  card  —  is  carried  out  twice. 
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In  the  first  stage  of  works  retrieval  criteria  are  transferred 
to  punched  cards,  and  on  the  second  stage  (after  obtaining  an  inverted 
recording  on  the  computer)  the  name  of  the  document  is  punched.  On 
the  first  stage  there  are  transferred  to  punched  cards  the  number  of 
the  document  and  descriptors  in  the  form  of  codes  (retrieval  pattern). 

The  whole  deck  of  punched  cards  is  divided  into  subdecks  (455 
punched  cards  each,  which  is  explained  by  insufficient  working- 
storage  capacity),  and  they. are  put  into  the  ’'Mlnsk-22.” 

According  to  the  developed  program  in  the  computer  there  is 
created  recording  of  documents  from  descriptors  entering  the  document. 

•  ' 

The  Inverted  recording  "descriptor-documents”  is  created 
according  to  the  following  algorithm: 

1,  The  putting  of  punched  cards  into  the  computer  and  the 
recording  of  Information  about  documents  in  the  form:  after  every 
document  number  descriptors  are  located  sequentially.  Recording  is  in 
the  binary  decimal  system.  For  every  document  or  descriptor  number 
there  is  an  individual  working-storage  cell. 

2.  The  regulating  of  descriptors  inside  documents  and  the  crt. r.tion 
of  a  table  of  initial  addresses  of  location  of  documents  in  working., 
storage . 


3.  Creation  of  a  single  list  of  descriptors  on  all  documents. 

4.  Formation  of  Inverted  recording  of  documents  (on  every 
descriptor  there  are  written  out  all  numbers  of  documents  in  which  it 
is  encountered) . 

5.  Printing  of  obtained  recording  "descriptor-documents”  on 
paper  tape. 

The  work  of  the  program  in  terms  of  creation  of  an  inverted 
recording  in  many  respects  is  similar  to  the  work  of  the  program  of 
inverted  retrieval  on  the  "Ural-2"  described  earlier  [2]. 
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The  .form  of  the  recording  ’'descriptor-documents”  on  paper  tape 
obtained  from  the  "Minsk-22”  is  shown  in  Pig.  1,  where  the  number 
+002010000  designates  the  code  of  the  descriptor  "charged  particles 
(0201),  and  +000052112  designates  the  code  of  the  reference  number 
of  the  document  (52112).  The  descriptor  "chelaty"  [no  translation 
found]  is  code  0202,  and  the  descriptor  "reaction"  is  code  0205. 


*M»0i»0Q0  ohargpd 
**ooo»»m  particles 

*•0001711* 

*000077 1  SO 
*000077007 
*000077001 
•rooossoT* 

•000077000 

*000977011 

*000077070 

•OOtotoooo  Chelaty 

*OQ0017t*0 

♦lo.toiooeo  'reactions 

«#ooo«tsoi  ^ 

*««0«0t»l0 


Pig.  1.  Porm  of  Inverted 
recording  "descriptor- 
documents"  on  paper  tape. 


According  to  the  obtained  recording  from  the  fund  there  are 
splintered  information  cards  by  which  there  is  punched  the  name  of 
the  documents  according  to  the  developed  mock-up  shown  in  Pig-.  2. 


Pig.  2.'  Mock-up  of  punched  card  for  composition 
of  descriptor  indicator. 


As  can  be  seen  from  the  figure  the  punched  card  carries: 
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1.  The  descriptor  code  —  four-digit  number  (columns  1-4). 

2.  The  text  of  the  document  name  (columns  5-74). 

3.  The  retrieval  number  of  the  document  (columns  75t79)* 

4.  The  number  of  the  punched  card  (column  80). 

During  the  composition  of  the  Index  eight-digit  code  Is  broken 
up  Into  two  four-digit  codes,  and  the  basis  Is  the  first  descriptor. 

The  main  part  of  the  punched  card  is  for  texts  of  documents. 

Since  the  text  of  the  title  will  not  fit  on  one  punched  card, 
provision  Is  made  for  cards  with  continuation,  l.e.,  the  text  of  the 
name  of  documents  Is  printed  on  several  punched  cards,  which  are 
Interconnected  by  the  number  of  the  punched  card. 

During  the  punching  of  names  of  documents  great  difficulties 
are  encountered  in  designations.  An  alphameric  tabulator  does  not 
have  many  signs,  su^h  as  /,  !?,  and  others,  and  furthermore,  many 
formulas  are  not  standardized  In  names,  and  therefore.  In  those  cases 
in  which  verbal  recording  was  unsuccessful,  the  formulas  were  manually 
Inscribed  In  a  tabulogram. 

The  following  stage  of  work  is  selection  of  punched  cards  from 
the  Invei’ted  list  In  alphabetic  order.  Documents,  belonging  to 
descriptors  are  taken.  The  sampling  Is  dons  on  an  C-.80-5M^  sorter. 
Sorter  is  by  numbers  of  documents  and  punched  cards  (columns  75-80). 

At  present  the  name  of  the  document  Is  punched  once.  Therefore, 
If  the  document  Is  encountered  In  the  inverted  list  several  times  — 
and  this  Is  normal  since  the  nuniber  of  descriptors  describing  the 
document  averages  8-12,  then  It  is  necessary  to  do  extensive  work  on 
sorting  the  punched  cards. 

In  further  work  It  is  proposed  to  improve  available  technology 
by  duplicating  punched  cards  of  identical  dociiments ,  punching  In 
columns  1-4  the  numbers  of  descriptors  belonging  to  these  documents 
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for  the  purpose  of  sampling  the  necessary  cards. 


For  text  printing  In  the  Indicator  of  descriptors  according  to 
their  codes  there  is  developed  a  mock-up  of  the  punched  card  different 
from  the  mock-up  of  printing  the  name  of  the  document. 

The  whole  field  of  the  punched  card,  except  for  the  first  four 
columns,  is  for  the  text  of  the  descriptor.  The  name  of  the  descriptor 
Is  printed  according  to  the  -Inverted  list  mentioned  at  the  beginning 
of  the  article. 

The  first  four  columns  are  omitted  for  displacement  of  the 
recording  of  the  descriptor  on  the  tabulogram  —  for  the  best  reading 
of  the  Indicator. 

,  After  selection  of  punched  cards  of  names  of  documents  and 
punched  cards  on  which  name  of  descriptors  are  recorded,  on  an 
alphameric  tabulator  there  Is  printed  a  mock-up  —  a'  forai  from  which 
the  descriptor  index  Is  printed  with  a  1:0.7  decrease. 

Figure  3  shows  the  page  of  a  descriptor  index.  In  the  left  part 
of  the  page  there  are  de|,crlptors ,  and  in  the  right  part  numbers 
(retrieval)  of  information  cards. 

The  Index  Is  reproduced  by  means  of  an  operational  polygraph. 

The  form  for  reproduction  Is  prepared  on  an  ”ERA.  '■ 

In  the  first,  experimental  indicator  there  are  included  1000 
documents  composed  on  the  basis  of  abstracts  of  reports  on  experimental 
works,  carried  out  by  organizations . of  the  State 'Committee  on  use  of 
atomic  energy. 

t 

The  purpose  of  the  Index  is  to  reduce  the  time  of  getting 
Indicative  Information  to  the  user  and  make  It  easier  to  retrieve 
necessary  literature. 

From  a  descriptor  or  combination  of  descriptors  the  user  finds 
the  necessary  document  number.  After  appraisal  If  can  through  the 


reference  and  information  fund  of  the  TsBNTI  obtain  an  information 
card,  original,  or  copy.  Certain  sources  of  information  are  issued 
in  the  form  of  microcopies. 

Conclusion 

To  speed  up  getting  documents  to  users  at  the  TsBNTI  there  is 
developed  a  descriptor  index.  The  index  is  Intended  both  for  users 
working  directly  in  the  field  of  atomic  science  and  technology  and 
for  users  of  other  departments  for  which  information  of  a  similar 
form  is  needed. 

Output  of  descriptor  Indices  is  one  of  the  steps  in  the  creation 
of  an  effective  system  of  information  service. 

Existing  computerized  IPS  at  present  can  service  only  a  certain 
circle  of  users  governed  by  the  possibilities  of  service  of 
Information  and  economic  considerations. 
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The  descriptor  Index  should  allow  more  widely  using  the  existing 
data-processlng  system  and  giving  to  a  wide  circle  of  consumers 
retrieval  by  index  and  use  of  information  materials  available  in  the 
reference  and  information  fund  of  the  TsBNTI  on  the  use  of  atomic 
energy . 

Practice  of  work  shows  that  the  time  of  output  of  the  index  with 
the  help  of  computers  and  punch-card  machines  can  be  considerably 
reduced  as  compared  to  such  time  for  an  index  prepared  completely 
manually . 

Wide  application  of  indices  of  the  desc  'iptor  type  to  various 
information  materials  —  reports.  Journals,  and  technical  specifications 
and  records,  will  allow  improving  information  service  of  infoimation 
users . 
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